Cyber Security with Artificial Intelligence; This article is written for people who are interested in cyber security and artificial intelligence.
What is the difference among Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning?
Artificial intelligence is a science field that is interested in finding solutions to complex problems like humans do. A decision mechanism that is similar to a real human decision mechanism is tried to be modelled with some algorithms. Machine learning is a subdomain of artificial intelligence. Machine learning uses mathematical and statistical ways to extract information from data, and with that information ml tries to guess the unknown. Deep learning is a subdomain of machine learning and tries to learn the data with artificial neural network approach.
What exactly is AI the solution for? What does AI do better than a human?
Artificial intelligence (AI) is a result of a software that tries to create a decision mechanism similar to human brain’s decision mechanism. However, in the early years of AI, it couldn’t become the exact imitation of the human brain. With the improvement in science and technology, it was seen that human brain is way too complicated to be modelled with a software. In the following years, researchers focused on decision mechanisms in more specific fields instead of focusing on modelling the exact human brain structure. The artificial intelligence researches were mostly focusing on a specific problem and were trying to solve that problem, until now.
The purpose of AI applications is to solve something that an expert spends a certain time on, in a very short time. For instance, think about a doctor who is an expert on cancer, the process of his detecting the cancer cells can be modelled with an artificial intelligence technique. The model that we get can always detect the cancer cells just as well as the doctor does. Besides, the developed software can be used by everyone; Hospitals that don’t have specialist doctors in this area can also do(make) successful detection.
Above all, every problem that requires experts — as long as we have the appropriate data — can be modelled by using artificial intelligence techniques. Data that is fit to the problem means the data that has the features of that specific problem. For example, for the detection of cancer cell, cell’s size, its growing speed, the enzymes that it secretes, its rate of mutation, etc., this kind of distinctive features need to be collected. If the features that are related to the problem can be collected, the problem can be solved by using machine learning techniques.
In most of the problems which it’s not possible to analyze the tremendous amount of data by an expert in some certain time, artificial intelligence applications can come up with successful and fast solutions.
Which types of artificial intelligence applications are being used in cyber security solutions?
It is up to human imagination. For the sake of clarity, following application categories can be examined:
- Spam Filter Applications (spamassassin)
- Network Intrusion Detection and Prevention
- Fraud detection
- Credit scoring and next-best offers
- Botnet Detection
- Secure User Authentication
- Cyber security Ratings
- Hacking Incident Forecasting
How can an artificial intelligence application that does malware analysis be used?
It’s possible to detect a software whether is a Malware or a normal software with artificial intelligence. In order to develop an artificial intelligence application that does malware detection the first thing to do is to determine some distinctive features. In addition of some harmless software and some malware to those features, the system is trained.
Here are some features to use in analyzation of a software:
- Accessed APIs,
- Accessed fields on the disk,
- Accessed environmental products (camera, keyboard etc),
- Consumed processor power.
- Consumed bandwidth.
- Amount of data transmitted over the internet.
By using the distinguished features, the system is built. Once you give a test software to the system, it tries to detect whether the software is a malware or not by analyzing these distinguished features.
Is artificial intelligence(AI) used to detect cyber attacks, how is its success rate?
Of course AI can be used to detect cyber attacks. There are plenty of academic researches about detecting cyber attacks using artificial intelligence. The success rate of those researches varies between 85% and 99%.
In the last few years, in addition to academic researches, some products have been improved to detect cyber attacks with the help of artificial intelligence like DarkTrace. DarkTrace claims to have more than 99% of success rate and it also has a very low rate of false positives. For more details, you can check the company’s website.
What are the alternatives of open source machine learning libraries?
Firstly, it’s important to say that it doesn’t matter which language you’re using for machine learning. What’s really important is the algorithmic approach. As long as you know the machine learning algorithms, you can use the programming language that you want and code the algorithms using that language — or using the libraries — and develop a machine learning application.
Python, is one of the most common language used in machine learning. It is an open source language that can easily access to a lot of libraries for different purposes.
The libraries that is used for machine learning are;
- Scikit Learn (Sk-learn): It is a huge library with a lot of algorithms. Using this library makes running the algorithm that you want possible with just four lines of code.
- Numpy: In a machine learning application, statistical and mathematical complex calculations happen a lot. For that reason, the library Numpy for mathematical functions is essential for machine learning applications.
- Pandas: Pandas library is used to process the data fast and effectively.
Which sources can I use if I want to get information about artificial intelligence?
For beginners to get the latest articles about artificial intelligence you can follow these blogs:
You can also find cyber security related artificial intelligence researches in this (http://www.covert.io) source. You can also check NormShield’s blog page (https://www.normshield.com/blog) to access the articles about cyber security and machine learning.
Does cyber security domain has any other differences from other machine learning domains?
Machine learning is being used in tremendous amount of applications. In most of these applications, the things that we want to detect can mostly be defined. Contrarily, in some of the cyber security problems, the thing that we want to detect is not implicitly defined. Additionally, cyber security domain requires work from the most updated data. However getting the most updated data is one of the challenges of this domain.
Are there any companies that develop cyber security applications using artificial intelligence?
There are lots of companies that develop cyber security applications by using artificial intelligence. The companies that started early focusing on this domain started worthing more in a very short time. Here are some of the examples,
Darktrace, the company that was founded in 2013, developed a product that does anomaly detection on a network with machine learning. The company is now worth 825 million$ CYLANCE, the company that was founded in 2012, developed a product to prevent advanced level of cyber threats. The company is worth 1 billion﹩now. The leading companies that use artificial intelligence in cyber security domain are listed in a report by CBInsight (See the image down below) :
In the last few years, with artificial intelligence becoming more popular, there has been a serious increase in the number of startups that focus on cyber security domain. According to CBInsight, in the applications of artificial intelligence cyber security is on the 5th place !!!
Can you give some examples of the machine learning algorithms which are being used to develop cyber security applications?
Spamassassin, for instance, is a project that is an open source code and it does spam mail filtering. Spamassassin makes a feature list in order to control if an email is a spam mail or not. Extracted features from an analyzed email is processed with Naive Bayes algorithm. The most common algorithms in cyber attack systems are, Random Forest, Decision Tree, Support Vector Machines etc.
In the last few years, the most commonly used machine learning algorithm is without a doubt, Deep Learning algorithm. Deep learning is a machine learning algorithm that uses artificial neural networks. Nowadays, most of the companies that do artificial intelligence researchers use this method.
Is it possible to detect cyber attacks before they happen?
In order for a cyber attack to be successful, there are some steps to follow successfully. These steps are called “Cyber Kill Chain”. Attackers might leave some traces in some of these steps or they can access information about the targeted company that was leaked before, while they’re in information acquisition phase. We can see similar situations like this one. Preventing these kind of situations is only possible if you observe your company constantly with the eyes of an attacker. In addition to that, knowing what the attackers can find when they do their research about your company beforehand, and as a result, taking precautions prevents these situations.
Normshield Cyber Risk Scorecard, scans most of the information about your company, that can be accessed via internet.
Some of the information that can be accessed about your company are:
- Hacktivist posts that target your company in dark forums or social media.
- Leaked information about your company’s customers and employees. (e-mail, passwords, credit card information etc.)
- Phish website, mobil and desktop applications about your company.
If you know your virtual existence well and can manage it, you can reduce the risk of being affected from a cyber attack. Cyber Risk Scorecard, gives you the possibility to access information about your company from various sources and lets you manage that data which results to taking precautions.