Machine Learning in Cyber Security Domain; As a dictionary term, Authentication (or Verification) is independent procedures that are used together for checking that a product, service, user or system meets requirements and specifications and that it fulfills its intended purpose. User verification is a mechanism which gives permission to user to log in applications or systems. No one else can access to user account except real user, in ideal systems. In general, username and password are used for authentication to systems when the target system is an online service. These fields are vulnerable to brute force attacks, if no preventive measures are taken. Attackers are able to try all combinations to crack user’s passwords (trial and error).
Machine Learning in Cyber Security
It is strongly recommended to use secure passwords which have numbers, letters, and special characters and also have minimum length. Security-conscious companies maintain password creation policies to make sure that every employee’s password is safe. If a user takes this precautions, cracking his/her password may take years through online brute force. Security-aware companies store user passwords in database in hash format, thus even if their systems are hacked, passwords can not be cracked. Of course hash algorithm which is used must be strong, such as adaptive hash algorithms (bcrypt). Beside these precautions some additional security mechanisms are used to prevent unauthorized access to systems such as captcha and two-factor authentication.
Captcha is an additional security layer for authentication to prevent brute force or dictionary attacks, using captcha images. Thus, automatic brute force tools can not recognize these images and can not go further after showing this image in authentication mechanism.
Two-factor authentication is also additional security layer for authentication to prevent unauthorized access. This type of mechanism uses some additional information which is known by only the real user. This information can be OTP (one time password) which is sent to a pre-registered cell phone, or it can be biometric information for real world applications.
Authentication mechanisms are not used only in web applications. In real world applications, for example entering to secure facility such as military building, some additional precautions should be used for authentication like biometric verification.
In this chapter, ways to make an authentication mechanisms more powerful are explicated using various machine learning techniques. The best popular way to authenticate user is using as unique as possible information such as biometric data. But biometric verification mechanism requires physical access to enter authentication information. In order to use these type of authentication mechanism, specialized sensors are required which have monetary value. Thus, these systems can be used in highly critical real world applications but can not be practically used in classic web applications. There are some authentication techniques developed to use users’ unique information for authentication to systems like web application without using additional sensors.
There is two main categories for authentication mechanisms that utilizes machine learning. These categories are (1) Biometric Verifications and (2) Activity Based Verifications.
Biometric verifications is an authentication system that use unique human information. This informations have no mirror in the world. It is needed physical access to enter authentication information. Because of this, this type of verification mechanism using in real world secure mechanism. The most commonly known biometric verification systems are based on these;
- Fingerprint Recognition
- Finger Vein Recognition
- Retina and Iris Recognition
- Hand/Palm Recognition
- Voice Recognition
- Signature Recognition
- Face Recognition
These verification techniques commonly used around the world. For example it is commonly witnessed that fingerprint recognition systems using in ATM machines, secure facility entering and entrance to working area for employees. Voice recognition is commonly used in call centers with the purpose of identify customers. Palm recognition is commonly used in medical services for verification patient with high accuracy rate. Retina and Iris recognition is commonly used secure facility entrance etc.
In this part, it is explained that how can system recognize these pattern using machine learning techniques. Fingerprint recognition, retina-iris recognition, hand-palm recognition which are most commonly used biometric systems are explained with some detailed information. Other techniques explained shortly.
Fingerprint recognition one of the most commonly using biometrics types in the world. For the recognize fingerprint, firstly it must be scanned finger with fingerprint scanner. Output of fingerprint scanner mechanism is a single image which shows finger surface with black/white colored. And with the image processing techniques, this image is processed and extracted features.
In the training phase, it is collected fingerprint images for every user more than once and calculated feature informations are saved to database.
In test phase, collect user fingerprint image through fingerprint scanner sensor in real time. After this step, it is calculated feature information about test image. Finally, with the most basic approach, this informations are compared with the informations which is stored in database for real users. Distance metrics are used for comparison. If test image is similar enough to user’s train images, authentication result is successful, else authentication failed. Defining selected features, selected machine learning algorithm which will used for decision mechanism, and selected distance metrics are directly influential to accuracy rate.
Fingerprint detection is one the most commonly used authentication technique in personal life. So that, mobile phone producers implements fingerprint detection systems into their mobile phones. This type of systems make authentication easier to mobile phone’s owner and make harder for other people. This is a good example for the widespread use of user authentication systems with machine learning in daily life.
Retina and Iris Recognition
Retina recognition is a biometric technique that uses the unique patterns on a person’s retina for person identification. The retina is the layer of blood vessels situated at the back of an eye. The eye is positioned in front of the system at a capture distance ranging from 8 cm to one meter. The output of eye scanner sensor is a blood vessel image of retina. Every human’s blood vessel figure is unique. With image processing techniques, extracted features of image, then making a decision using machine learning techniques. Example of retina image given on the right.
Another biometric verification technique which is based on eye is a iris recognition.
The iris is the part of the eye that is colored and it is responsible for controlling the amount of light entering the eye. Iris has a veined structure and unique for every human in the world. Structure of iris extracted by image processing techniques and creating a decision mechanism using machine learning techniques.
Palm detection based two different logic; first of these scanning hand surface like fingerprint scanning, the second one is extracted blood vessel of palm. In hand recognition systems, hand is scanned by visual scanner and extracted surface of hand information. In palm detection systems, palm scanned by infrared sensor, and output of this type of techniques is a blood vessel of palm picture. Features extracted by image processing algorithms in both techniques. And creating decision mechanism using machine learning algorithms.
Voice recognition is a type of signal processing. Every human has unique voice, and this information detectable by machine learning techniques. Biometric verification can be done using finger vein information, signature shapes and face recognition. Accuracy rates of biometric verification techniques have given in figure below.
Activity Based Verification
Identity theft is a crime in which hackers perpetrate fraudulent activity under stolen identities by using credentials, such as passwords and smartcards, unlawfully obtained from legitimate users or by using logged-on computers that are left unattended. User verification methods provide a security layer in addition to the username and password by continuously validating the identity of logged-on users based on their physiological and behavioral characteristics.
Every individual person use authentication mechanism to log in countless times in a single day. In generally, only usernames and passwords are used for authentication to web applications. And it is commonly known that companies even the largest ones are hackable by attackers even now, and it is also known that in significant quantities of these have been hacked already. Individual user’s username and password informations may have already fallen down to internet or darkweb in clear text format without the any knowledge of the user.
Now, think about new type of verification methods which are created by user unconsciously. Even user’s own can not identify passwords correctly. Passwords are based on behavioral knowledge about users. We want to give you an example for the clear understanding. In this example activity based verification mechanism is not using for online service, but the main idea is same.
The example takes place in a movie with name
Mission: Impossible – Rogue Nation.
Briefly, in the movie, there is a secure facility and our guys want to enter this facility and steal a valuable information. Facility has multi layered security mechanism. Our interest is final step of this mechanism. Because, an activity based verification technique is used in the final step. In this step, user walking in a tunnel which is monitored by cameras and some other sensors. These sensor analyze users’ individuals walking behaviour. As you may notice, of course every person’s walking behaviour is unique. This step can not be passed unless the attacker copy this behavioral information. And it is nearly impossible. The reason is that this behavioral information is abstract. In physical biometric verification techniques, attackers know what must be copied for bypassing authentication mechanism. Because in physical biometric systems, the things which attacker want to copy are physical part of human such as fingerprint, iris etc. Behavioral informations are abstract knowledge of humans and can not be copied. If attackers kidnap real user, even so attacker can not copy this information. You can not copy the thing which you do not know what is and how it works.
In generally, mechanism which is used in movie works as we have described above. If you want to see how it works we recommend you to watch the movie. (Note that, of course our guys enter the systems with changing data about user behavioral information which is stored in database.)
Because of things which we have described above, Activity Based Verification is more relevant topic (in our opinion) about cyber security than physical biometric verification, because this type of verification systems are using in online services which are targetable directly by hackers. General structure of mechanism is given in Figure below.
– captures the events generated by the various input devices used for the interaction (e.g. keyboard, mouse) via their drivers.
– constructs a signature which characterizes the behavioral biometrics of the user.
– Consists of a machine learning algorithm (e.g. Support Vector Machines, Artificial Neural Networks, etc.) that is used to build the user verification model by training on past behavior, often given by samples. During verification, the induced model is used to classify new samples acquired from the user.
– A database of behavioral signatures that were used to train the model. Upon entry of a username, the signature of the user is retrieved for the verification process.
Although the title about this topic is Activity Based Verification, this technique can be used in two different ways for the same purposes. One of these way, checking user the other one is checking u
Until this point in this chapter, we have explained the way of checking user before login to system. But the other way is also interesting. General idea of second way is pursuing logged users by spying on them and detect whether the logged user is real user or not. Google has a patent for this purpose In this patent, it is used that social network activity for logged users for detect fraudulent activities.
In continuation of this section, it is explained that how can we build systems like these using machine learning techniques. Most common behavioral verification techniques are based on:
a) mouse dynamics, which are derived from the user-mouse interaction;
b) keystroke dynamics, which are derived from the keyboard activity; and software interaction (such as game playing), which rely on features extracted from the interaction of a user with a specific software tool.
Behavioral methods can also be characterized according to the learning approach that they employ. Explicit learning methods monitor user activity while performing a predefined task such as playing a memory game. Implicit learning techniques, on the other hand, monitor the user during general day-to-day computer activity. Nevertheless, it is the best way to learn unique user behavior characteristics such as frequently performed actions.
Keystroke dynamics’ features are based on calculating duration of pressing keys. Such as the example given right. These informations are unique for every human. Keyboard dynamics features also include, for example, latency between consecutive keystrokes, flight time, dwell time – all based on the key down/press/up events.
Keyboard-based methods are divided into methods that analyze the user behavior during an initial login attempt and methods that continuously verify the user throughout the session. The former typically construct classification models according to feature vectors that are extracted while the users type a predefined text (such as a password) while the latter extract feature vectors from free text that the users type. In recent paper, evaluated the security of keystroke-dynamics authentication against synthetic forgery attacks. The results showed that keystroke dynamics are robust against the two specific types of synthetic forgery attacks that were used. Although being effective, keyboard-based verification is less suitable for web browsers since they are mostly interacted with via the mouse.
People are able to surfing on the internet with the purposes of read newspaper, watching video or any action require only mouse interactions. This technique is useful for both before login, and after login phase. Useable feature informations for mause dynamic based authentication are given below. These informations are used in machine learning algorithms in order to detect real users.
- Mousemove Event (m)
- occurs when the user moves the mouse from one location to another. Many events of this type occur during the entire movement
- their quantity depends on the mouse resolution/sensitivity, mouse driver and operating system settings.
- Mouse Left Button Down Event (ld) – occurs when the left mouse button is pressed.
- Mouse Right Button Down Event (rd) – occurs when the right mouse button is pressed.
- Mouse Left Button Up Event (lu) – occurs after the left mouse button is released.
- Mouse Right Button Up Event (ru) – occurs after the right mouse button is released.
Finally, several types of software have been suggested in the academic literature to characterize behavioral biometrics of users for authentication and verification purposes. These include board games, memory games, web browsers, email clients, programming development tools, command line shells and drawing applications. These behavioral biometric features may be partially incorporated in user verification systems.
We want to give you one interesting information about this topic. Everyone knows the Recaptcha System which has developed by Google. Basically this system can decide that the created connection by real user or bot. To do so, system asks a question which is easy for human, tough for bots. The interesting thing is in some cases real users pass captcha mechanism without encountering any question. Because, Recaptcha uses behavioral biometric verification methods from the moment of entering the site, so real users can enter the systems easily. Used behavioral verification technique is based on software interaction. System collects cookie information and browser characteristics which has located on browser and analyze that information, after that system decides whether or not the connection is created by the real user. (Detailed information about captcha mechanism is given in Captcha Bypassing section.) When results which have analyzed is suspicious or cookie information is not enough, system ask question to user to enter the system.
Conclusion of User Authentication
Recently, due to the limitations of user authentication systems that employ a single user characteristic such as mouse dynamics or iris patterns, a multi-modal approach has been proposed in various papers. There are many studies developed in the literature using combining various authentication techniques, because using only one technique is not feasible.