NormShield Blog

Machine Learning in Cyber Security Domain – 9: Botnet Detection

Machine Learning

Machine Learning Botnet Detection; Botnet means an organized automated army of zombies which can be used for creating a DDoS attack as well as spammy actions of flooding any inbox or spreading the viruses. Actually, this army consists of a large number of computers. Attackers use this army for malicious purposes and generally, zombies are not even aware of that they are used for malicious purposes.

Zombies have been used extensively to send spam mail; as of 2005, an estimated 50–80% of all spam was sent by zombie computers worldwide. This allows spammers to avoid detection and presumably reduces their bandwidth costs since the owners of zombies pay for their own bandwidth. General structure about botnet attacks is given below.

This process is carried out by a centralized entity called C&C, which is also called a botmaster. A botmaster is an entity that coordinates to initiate, manage, or suspend attacks on all infected machines (bots).  Therefore, the aim of the C&C mechanism is to increase the number of zombie machines and to coordinate those machines for so many destructive operations. The difference between a botnet and other types of network attacks is the existence of C&C in the network. In addition, the bots receive instructions from C&C and act upon those instructions. The instructions/commands range from initiating a worm or spam attack over the Internet to disrupt a legitimate user request.

A botnet can do anything which you can imagine by the use of many computers connected to a network. Distributed power resources are the key points of the power of botnets.

Machine Learning Botnet Detection

Through the development of technology, every personal computer has the great amount of processing power (CPU, GPU) and bandwidth capacity. So, every personal computer which is joined into botnet is made botnet more powerful.

Works that require too much processing power can be done in distributed networks easily. In this type of network, the work is divided into sub works and assigned to the individual machine. The main purpose of botnet attacks is combining this multiple sources and building an incredibly powerful source. Combined sources can be bandwidth or processing capacity. After creating botnet which has enough bots, attackers can use it in so many malicious purposes. Some examples are;

  • Distributed Denial-of-Service Attacks (DDoS)
  • Spamming
  • Sniffing Traffic & Keylogging
  • Infecting New Hosts
  • Identity Theft
  • Attacking IRC Chat Networks
  • Hosting of Illegal Software
  • Google AdSense Abuse & Advertisement Addons
  • Click Fraud
  • Manipulating Online Polls
  • Remote Use of Computers
  • Attacking Bank Computers (Atm or any others since they are also networked)
  • Manipulating Games
  • Exploiting Private Documents

Humanity is witnessed so many botnets and their attacks. Each one of them has an effect that causes material damage for target firms. Some botnets have grown incredibly and caused very large damage across the world. Hacker News users have reported that the following sites are down: witter, Etsy, Github, Soundcloud, Spotify, Heroku, Pagerduty, Shopify, Intercom.

Netflix, Slack, Imgur, HBO Now, PayPal, PlayStation Network, Yammer, Seamless, and many more services have also experienced interruptions in attack day. It is certain that Mirai is not only IoT botnet, we can witness another IoT botnet attacks in the near future.

So, botnet detection and elimination of these botnets are the important challenging tasks in the cyber security domain. Big companies that have security-concern have made great effort to detect and eliminate botnets. For example, ZeuS botnet malware package that runs on Microsoft OS operated for over three years in just this matter, eventually leading to an estimated $70 million in stolen funds and the arrest of over a hundred individuals by the FBI in 2010. ZeuS was active, even when ZeuS creator was arrested. Microsoft which is suffered the most from this botnet, spent great effort to eliminate ZeuS. Eventually, in March 2012, Microsoft announced it had succeeded in shutting down the “majority” of C&C servers of ZeuS.

It has been observed that detecting a zombie machine is not an easy task. Even one of the zombie machines detected, what about the rest of the network? Detecting all network about a specific botnet is a tough task. So that, it is harder to recognize a botnet, if zombies are IoT devices.

How can we detect a botnet ? Here is where the Machine Learning came into play.

Botnet detection is somewhat different from the detection mechanisms posed by other malware/anomaly detection systems. Before explaining botnet detection techniques, we want to give you an explanation about what is the differences and similarities between botnet detection and malware/anomaly detection for a clear understanding.

The term anomaly detection refers to the problem of finding exceptional communication patterns in the network traffic that do not fit in to the expected normal behavior. For each category of anomaly detection techniques, the authors made a unique assumption with respect to the notion of normal and anomalous data.

In contrast to other attack detection types, botnet detection  refers to the detection of such malicious/anomalous activities that are governed in a controlled network environment. Malware distributors consider botnets as means of to disseminate the malicious and anomalous activities around the globe. As a result, botnets became popular, since it’s consisted of remotely controlled networks of hijacked computers.

The basic aim of this distributed coordinated network is to initiate various malicious activities over the network, including phishing, click fraud, spam generation, copyright violations, keylogging, and most importantly, DoS attacks. (Some other examples are given above.) Botnets are identified as a serious threat to network resources over the Internet. As a summary, attacks which are detected in IDS/IPSs represent an individual pattern and these attacks are applied from one specific source. Attacks which are produced by botnet are part of a big network. The interest of botnet detection is compromised all assets of the botnet and collapse C&C servers.

In this chapter, we focus only on botnet detection techniques which are developed using machine learning and gives you a brief explanation about working mechanism about these techniques. There are so many techniques in the literature. The general structure of botnet detection techniques is given below.

Botnet detection techniques are classified into two broad categories, IDSs and HoneyNets. A honeynet is used to collect information from bots for further analysis to measure the technology used, botnet characteristics, and the intensity of the attack. Moreover, the information collected from bots is used to discover the C&C system, unknown susceptibilities, techniques and tools used by the attacker, and the motivation of the attacker. A honeynet is used to collect bot-binaries which penetrate the botnets. However, intruders developed novel methods to overcome honeynet traps. The key component of honeynet trap is the honeywall, which is used to separate honeybots from the rest of the world.

Another botnet detection technique is based on IDS. IDS is a software application or hardware machine to monitor system services for malicious activities. IDS detection techniques are further classified as two types of approaches, signature-based, and anomaly-based.

In Signature Based systems, botnet signatures are used to give information about specific botnet behavior. But this type of techniques can not detect unknown botnet whose signature is not created before.

Anomaly-based detection is a prominent research domain in botnet detection. The basic idea comes from analyzing several network traffic irregularities including traffic passing through unusual ports, high network latency, increased traffic volume, and system behavior indicating malicious activities in the network. Anomaly-based approaches are further divided into host- and network-based approaches. In host-based approaches, individual machines are monitored to find suspicious actions. Despite the importance of host-based monitoring, this approach is not scalable, as all machines are required to be fully equipped with effective monitoring tools.

As opposed to other techniques, network-based approaches analyze network traffic and gathering some meaning about botnets using machine learning techniques. Network monitoring tool examines network behavior based on different network characteristics, such as bandwidth, burst rate for botnet C&C evidence, and packet timing. It filters traffic that is unlikely to be part of botnet activity, classifies the remaining traffic into a group that is likely to be part of a botnet.

Machine learning techniques are used widely in both anomaly-based approaches; host based and network based. Some of the used machine learning techniques are Decision Trees, Neural Networks, Graph Theory, Artificial Immune System, Clustering Based techniques, Data mining Based Techniques, Correlation, Entropy etc.

Machine Learning Host Based Botnet Detection Techniques

In host-based anomaly detection techniques, the behavior of bots is investigated by scanning the processes which is related to specific applications installed on the host machine. Each bot independently initializes commands received from the C&C system. Each command has certain parameters, specific types, and predetermined execution orders.

There are so many studies on host-based approaches for detecting botnets which work on client-side. Some of them are explained below. We have been utilized greatly from the article when writing this Botnet Detection article.

BotSwat (Stinson and Mitchell, 2007) is a tool for monitoring home operating systems (such as Windows XP, Windows 2000, and Windows 7) and recognizing the home machines anticipated as bots. Initially, BotSwat acts as a scanner, monitoring the execution status of the Win32 library and observing runtime system calls created by a processor. Furthermore, it tries to discover bots with generic properties despite the particular C&C architecture, communication protocols, or botnet structure. The problem with this approach is the lack of security for system calls. Masud et al. (2008) developed an effective host-based botnet detection technique using a flow-based detection method by correlating multiple log files installed on the host machines. As bots normally respond more quickly than humans, mining and correlating multiple log files can be easily recognize. It is proposed that these techniques can be efficiently performed for both IRC and non-IRC bots, by correlating several host-based log files for some C&C traffic detection. The multi-agent bot detection system (MABDS) (Szymczyk, 2009) is a hybrid technique which associates an event-log analyzer with the host-based intrusion detection system (HIDS). This uses multi-agent technology which combines the administrative agent, user agent, honeypot agent, analysis of the system, and the knowledge database. The basic problem for this technique is the slow convergence of new signatures with the knowledge base.

Machine Learning Network Based Botnet Detection Techniques

In a network-based botnet detection strategy, the malicious traffic is captured by observing the network traffic within different parameters, including network traffic behavior, traffic patterns, response time, network load, and link characteristics. Network-based approaches are further classified into two types, active monitoring, and passive monitoring.

Active monitoring: In active monitoring botnet detection policy, new packets are injected to the network in order to detect malicious activities.

Passive monitoring: In passive monitoring, network traffic is sniffed when the data is passed through the medium. The network traffic is analyzed by applying different anomaly detection techniques. Passive monitoring techniques employing various application models include statistical approaches, graph theory, machine learning, correlation, entropy, stochastic model, decision trees, discrete time series, Fourier transformation, group-based analysis, data mining, clustering approach, neural networks, visualization, and a combination of these technologies.

BotProb (Tokhtabayev and Skormin, 2007) is considered an active monitoring strategy, which injects packets into the network payload for finding suspicious activity caused by humans or bots. As non-human bots usually transmit commands on a predetermined pattern, which corresponds to the cause and effect correlation between C&C and the bots. Such command and response architecture can easily determine the existence of bots because the response comes from the predetermined command behavior.