<![CDATA[Malware Capture  facility project - Analysis]]>Fri, 08 Jan 2016 02:01:08 -0800Weebly<![CDATA[malware uses multiple web servers to have a periodic http C&C connection while its netflows are not periodic]]>Mon, 10 Nov 2014 13:34:19 GMThttp://mcfp.weebly.com/analysis/botnet-uses-multiple-web-servers-to-have-a-periodic-http-cc-connection-while-its-netflows-are-not-periodicPictureSome hours of traffic in the CTU-89-1 capture
While analyzing our capture CTU-Malware-Capture-Botnet-89-1 we found out that there were some strange issues with the periodicity of the C&C channels. In this capture there were a lot of HTTP connections, but few of them were periodic. During the analysis of the network capture we usually start looking at the NetFlows and then we move to the payload data. What we found is that several periodic HTTP connections had non-periodic NetFlows. This was strange for us so we took a deeper look. 

The traffic of this malware looks something like this in our monitoring server:

We first converted the pcap file to a web log file (using justsniffer) to see the HTTP requests better. An example of the requests are:

TimeStamp Method URL
1339.609 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY
1632.742 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY
1933.323 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY

To find out the periodicity of these requests we just compute the difference between timestamps and we print it in the first column. These differences were around 300 seconds, i.e. 5 minutes:

293.133 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY
300.581 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY
293.941 GET http://msg.video.qiyi.com/vod.gif?method=ppsshare&platform=pc&deviceid=BAACOW674EDUE4BECYWWZZZBNAPAWAEY

This confirmed that these HTTP requests were periodic, but what about their NetFlows? To find out the NetFlows we extracted the IP addresses used in these requests and we sorted them by amount of requests. The results are:

Amount, IP
  • 16,
  • 18,
  • 18,
  • 18,
  • 19,
  • 20,
  • 21,
  • 25,
  • 25,
  • 26,
  • 29,
  • 30,
  • 31,

This was the first interesting part, since the same URL was being requested alternatively to different IP addresses. Once that the IP addresses were extracted, we generated their 4-tuples and see their periodicity (described in the HackLu 2014 presentation). To get the 4-tuples we first convert the pcap file to a bidirectional Argus file:

argus -F argus.conf -r 2014-09-15_capture-win2.pcap -w 2014-09-15_capture-win2.biargus

Then we extract the NetFlows (the ra.conf file has specific fields):

ra -r 2014-09-15_capture-win2.biargus -n -Z b -F ra.conf > 2014-09-15_capture-win2.binetflow

And then we use our CCDetector.py (to be released soon) program that implements the state-based behavioral model (also described in the HackLu 2014 presentation). 

CCDetector.py -f 2014-09-15_capture-win2.binetflow -P oneline > 2014-09-15_capture-win2.3model

From this .3model file we can see the characteristics of the 4-tuples related with the IP addressess:

  • State:220s0ssss0ssss0ssss0ss0ssssssst0s
  • State:120ss0s0sss0s0s0s0s0s0s0ss0s0ss0s
  • State:110r
  • State:22sssss0s0s0ssb0ss0sss0s0sssss0ss
  • State:220s0s0s0s0ss0s0s0s0s0ss0s0sss
  • State:220s0sssbbss0sss0ss0ss0ss0ssssssssss
  • State:220ssbs0ss0s0ss0sssssssssbB0ss0s0ssss0s0s
  • State:22ssss0s0s0sss0sssss
  • State:110r
  • State:22sss0sssssss0ssss0s0ss0ssssss0ssbsb0s
  • State:22ssss0s0st0ss0sst0s0ss0s0s0ss
  • State:22Bbsbssssss0s0sss0ssB0ss0ss0ss
  • State:220ts0sss0ssss0s0ssss0ss
  • State:22ssts0s0ss0ss0sss0s0ss0s0ss
  • State:23b0sss0s0ssss0s0sss0s0s0s

In our state-based behavioral model the letters for periodic flows are 'a' to 'f' and 'A' to 'F'. Considering that the letters in these previous states were mostly 's' and '0', we conclude that there are NO periodic flows in these connections. However, we know that the HTTP requests are periodic. So what happened? 

To confirm that these previous 4-tuples are not periodic we can 'open' the 4-tuple and see the flow by flow analysis. This is the information for the first 4-tuple:

        1970-01-01 02:37:09.810596      T1=-1  T2=-1  TD=   0.0
        1970-01-01 02:42:09.781309      T1=-1  T2=299.970713  TD=   0.0
        1970-01-01 05:27:11.154282      T1=299.970713  T2=9901.372973  TD=9601.4
        1970-01-01 07:17:11.504023      T1=9901.372973  T2=6600.349741  TD=-3301.0
        1970-01-01 08:07:12.292119      T1=6600.349741  T2=3000.788096  TD=-3599.6
        1970-01-01 08:37:12.330020      T1=3000.788096  T2=1800.037901  TD=-1200.8

Here T2 is the time difference between the current flow and the previous one, and T1 is the time difference between the previous flow and two flows ago. The values shown mean that the times of these requests were 299s, 9901s, 6600s, 3000s, etc, which are not periodic. So we confirm that the flows for the IP were not periodic.

The answers to this problem is that the bot was sending HTTP requests to a specific URL, but the IP addresses assigned to the web server keep changing in some sort of load balancing schema. This is very common in normal applications, but in this case the malware is using a complex load balancing to have a periodic C&C HTTP connection. 


The implications of this load balancing schema are that:
  • When researchers analyze network traffic, we tend to consider each connection separately. If the detection method is using NetFlows, it is most probably going to miss this periodicity.
  • If the web log analysis is using the IP address of the web server as an index, then it could be possible that the researcher will miss the connections to the rest of the IP addresses. 
  • Finally, we think that the owner of the malware is not aware of this complications because the load balancing seems to be designed to give more resilience to the botnet and not to hide the network patterns.

<![CDATA[Malware started to randomize the request times in relation with their C&C channels]]>Tue, 05 Aug 2014 11:01:57 GMThttp://mcfp.weebly.com/analysis/malware-started-randomizing-the-request-times-of-their-cc-channelsYesterday we found out a malware that uses a DGA algorithm to find out the domains of their C&C servers. It become interesting when we noted that the DGA communication is not using periodic requests on purpose. In fact it seems to be specifically generating the requests times of its DGA packets in order to avoid being periodic. 

The malware, which MD5 is c740789d5b226668f8a37626883fd0b7, is detected by AVAST as Win32:Dropper-KRG [Drp] and by Sophos as Mal/Steppa-A. The dataset where this behavior was found can be downloaded from CTU-Malware-Capture-Botnet-31 and took place between Nov 2013 and Jan 2014 in our capture facility. In the capture file 2013-11-25_capture-win7-3.pcap it can be seen that there is a large group of packets going to the IP address, destination port 53/TCP. The content of these packets are DNS requests asking for domains being generated with a DGA. For example:
  • kbzmyrj.net
  • gczdamdbyahv.net
  • dgcfpcofdwmt.net
  • kxfighr.com
  • fqmtpgifhiyb.net
  • jpagdbaepbm.net

These connections differentiate from a normal DNS requests because:
  • They use the TCP protocol instead of UDP.
  • They are being made to a DNS server chosen by the attacker and not to the one defined by the network. 

Even more, the simple trick of using TCP works well because some analysis tools, such as passivedns, fail to find the requests.

The analysis of all these DNS request using our behavioral state model (CCDetector.py tool) shows that the requests are not periodic. The following are a sample of the flows sent in these DNS requests:

"Time"  "Time Difference 1 (T1)" "Time Difference 2 (T2)" "Difference of Time Differences (TD)"
01:03:07.901969,   T1=-1,                T2=-1,                TD= 0.0
01:03:13.099428,   T1=-1,                T2=5.197459,   TD= 0.0
01:03:13.970381,   T1=5.197459,   T2=0.870953,   TD= -4.3
01:04:32.543049,   T1=0.870953,   T2=78.572668, TD= 77.7
01:05:29.735165,   T1=78.572668, T2=57.192116, TD= -21.4
01:05:35.954195,   T1=57.192116, T2=6.21903,     TD= -51.0
01:05:45.808630,   T1=6.21903,     T2=9.854435,    TD= 3.6
01:05:57.415327,   T1=9.854435,   T2=11.606697, TD= 1.8
01:06:43.150694,   T1=11.606697, T2=45.735367, TD= 34.1
01:07:00.225639,   T1=45.735367, T2=17.074945, TD= -28.7

The columns mean:
  • "Time of flow": Time when the flow was seen.
  • "Time Difference 1-2": Time between the current flow and the previous one.
  • "Time Difference 2-3": Time between the previous flow and the 2nd previous flow. 
  • "Difference of Time Differences": Difference between Time Difference 1 and Time Difference 2.

The TD value is a good indicator of the periodicity of the requests. When TD is close to 0, it means that the flows are more periodic. In this case we can see that there is no periodicity. However, it is interesting that the T1 values seem to be very different, and this is not usually the case with programs that try to communicate with their C&C server often. So we decided to analyze the values of the T1 column to see if there was a relationship between them. A quick plot of its probability distribution show us this:
Histogram of time differences between packets for the DGA.
This plot shows how the times between DNS requests may follow a probability distribution with these parameters:
  • Median = 23 seconds
  • Mean = 44 seconds
  • Max value = 55200 seconds
  • Stdev = 556
This is a strong indicator that the times between packets of this DGA are not random and are not periodic. Moreover, there seems to be an underlying probability distribution generating the packets. If this is confirmed, this can be the first time that the requests related to a C&C channel are not periodically generated on purpose by the malware with a fixed frequency, but instead it uses a probability distribution function to send the packets.
<![CDATA[Analisis of CTU-MALWARE-CAPTURE-1 (ZBOT.OOWO)]]>Sat, 01 Mar 2014 19:55:20 GMThttp://mcfp.weebly.com/analysis/analisis-ofctu-malware-capture-1-zbotoowoctu-malware-capture-1 (Zbot.oowo)
This capture was done between Thu Sep  5 15:40:07 CEST 2013 and Tue Oct  1 13:38:29 CEST 2013, having a total of 25 days and 21 hours. It corresponds to a binary with the MD5 46b3df3eaf1312f80788abd43343a9d2 of and that was classified by Kaspersky in VirusTotal as Trojan-Spy.Win32.Zbot.oowo. However we are not sure of the name.

The next image shows the complete graph of the traffic during the 25 days. It includes:
  • The amount of UDP packets per minute.
  • The amount of TCP packets per minute.
  • The amount of DNS packets per minute. (UDP and dst port 53)
  • The amount of SPAM packets per minute. (TCP and dst port 25)
  • The amount of SSH packets per minute. (TCP and dst port 22)
  • The amount of WEB packets per minute. (TCP and dst port 80)
  • The amount of SSL packets per minute. (TCP and dst port 443)
  • The amount of IPV6 packets per minute.

It was generated using the argus flows and RRD.
File Size: 167 kb
File Type: png
Download File

The image is big enough to zoom and analyze it.

The original files of this capture can be found on CTU-Malware-Capture-Botnet-1.

This capture consists of two pcap files, each containing the traffic of one bot.
  • 2013-10-01_capture-win12.pcap (5.8G)
  • 2013-10-01_capture-win8.pcap (5.6G)


The first action done by the botnet was to send UDP packets to some IPs. There was not DNS resolution before them so the group of IP addresses were hard-coded inside the binary. These UDP connections seems like part of a P2P protocol because they were done in groups, they seem to be encrypted (because they content is statistically random) and they are repeated every X minutes.
One of these UDP connections (to IP was answered (we call them established) and some information was downloaded. As soon as this UDP connection was established a TCP connection was done to the same IP. Also probably encrypted.

Some seconds later the bot started to connect to https://www.google.com (using TLS). The google connections were done during the whole capture. We are not sure what are these connections used for. 

After some more UDP established connections and google searches, the bot downloaded some binaries files. The next image show these packets. 
The first binary is downloaded from iwvsales.com/bc.exe and the second from www.solutics.ch/oKnUAf.exe
Eight minutes after the binaries downloads, the bot started to resolve hosts names. Some SMTP servers were connected to try to send SPAM and some web servers were contacted using POST requests. Next image shows how there was an SMTP connection to smtp.live.com and then the POST requests.
As it can be seen in the images, we added a label to every flow. This was done manually for each capture using the ralabel tool. The biargus file in the dataset contains the labels, so you can use them with the ra* client tools. The histogram of labels, so far, is:

  129871 Background
    23776 Background-ARP
  780609 From-Botnet-V2-DNS
  697878 From-Botnet-V2-SPAM
3423727 From-Botnet-V2-TCP-Attempt
  177176 From-Botnet-V2-TCP-Established
        997 From-Botnet-V2-TCP-HTTP-Google-Net-Established-1
        369 From-Botnet-V2-TCP-HTTP-SSL-Google-Net-Established-1
  177181 From-Botnet-V2-UDP-Attempt
    11166 From-Botnet-V2-UDP-Established

behavioral analysis

One of the main goals of the MCFP is to analyze the behavior of the malware. In this case we will analyze the periodicity of flows using our own behavioral model. This model uses a Markov Chain to represent the changes in the states of each connection. 

An example behavioral pattern that we can analyze are all the flows sent by the bot to the IP address and destination port 8033 using the UDP protocol. 
, we can see that the bot is connecting to it every 30 minutes. The following is the basic information about this pattern: - - 8033   TimeDiff:3 days, 5:55:45 State:55EeeEEeEEeEEhIvveeeeeeEEeeeeeeEEeeEeeeEeeeeeeeeeEeeeEeeEEeeEvvveEEeeeee
eedd #flows:153  DurMed:0.17s  SizeMed:0.386KB  FreqMed:30.000m  TDMed:21.000s  Label:From-Botnet-V2-UDP-Established
The pattern shows that the bot is connecting to this IP address with a median frequency of 30 minutes. Exactly. Thus, this connection is highly periodic.

The 153 flows of this connection were sent during 3 days and were very short, with a median duration of 0.17s. Also, they were small, with a median size of 386 Bytes. The label only represents the basic characteristics of the flow and was automatically assigned because the manual analysis is still in progress.

The State property uses letters to show the changes in the Markov Chain. All the 'e' and 'E' letters represent the periodic connections between three flows. All the 'v' states represents the loose of periodicity. It is interesting how the flows were changing during time (change of letters), getting more bytes, or getting shorter. It is an indicative that no connection is perfectly stable. More info about this model will be publish later.

The next plot is an analysis of these UDP connections in groups during several days. The plot X axis corresponds to hours and the Y axis to amount of flows every 30 minutes. It can be seen that there are a lot of attempted UDP flows at first, followed by some UDP established connections. However the amount of established UDP connections are less and less until the bot receives a new group of UDP addresses to try (red lines). This behavior is very similar to that of P2P connections.