This is a summary of initial (triage) analysis of Emotet droppers and the associated network traffic from the fall of 2019. This write-up provides the tools/techniques for assessing the malicious samples and gathering initial indicators of compromise (IOCs). While Emotet will certainly continue to evolve, the approach outlined here will provide a solid foundation for anyone looking to continue to analyze Emotet (or similiar).
Please Click Enable Content
Since resuming operations in September 2019, Emotet has not failed in regaining a foothold as a dominent botnet. To accomplish this, Emotet regularly utilizes macro-enabled Microsoft Office documents to retrieve and drop the primary Emotet payload. This trojan opens the door to a variety of nefarious activity, such as modular payloads containing a wide variety of functionality to dropping other malware. However, to gain that initial foothold Emotet depends on the recipient to enable macros. As is common with threat actors leveraging office documents, this boils down to the use of convincing social engineering in the make-up of the document to trick the user into executing the macro code. As can be seen in Figure 1, a recent Emotet document contains text indicating a compatibility difference in the user’s version of Microsoft Office, by clicking “Enable Content” the issue will be resolved. While simple, it is often effective and in enabling content, the user unwittingly executes the macros and allows the attacker in.
The most significant change in behavior observed during this time comes in how the documents leveraged base64 encoded strings for PowerShell scripts. Initially the base64 encoded strings were simply stored in a user-form within the document, a common technique for storing additional resources for the macro code such as shellcode, strings and base64 encoded PowerShell scripts. Figure 2 shows an example of this approach in which running the strings tool on the document reveals the complete base64 encoded PowerShell script in the document.
Since the base64 encoded string contains the PowerShell that downloads the Emotet trojan, decoding the string allows for extraction of all URLs used to download the Emotet binaries. Identifying the base64 encoded string is trivial and allowed for the discovery of first stage command and control URLs quite trivial. It did not take long before changes in this approach were observed. Instead of simply placing the base64 encoded string in the document, the Emotet authors began padding the string with random sequences of characters. These characters are simply replaced at runtime before the string is decoded and used to execute the subsequent PowerShell script. Figure 3 shows a user-form with several text boxes, the obfuscated base64 string begins with “12jnJA12j12jnn…”.
In this example, the padding value is “12jn” while the beginning of the valid base64 string is “JA”. A snippet of the obfuscated string extracted with strings can be seen in figure 4.
In addition to using padding, the same padding value is often split by itself. For example, the beginning of the obfuscated base64 string starts with “12jnJA12j12jnnBK…” Statically identifying the key is possible – 12jn – and removing it through string replacement. In this case, one round of replacement would result in “JA12jnBK”, since the padding value was split by itself, removing it reveals another instance. This requires a recursive approach to remove the padding value to completely deobfuscate the base64 string and prevents the automatic extraction and decoding of the PowerShell script. This approach protects the payload download URLs a bit longer. The extracted PowerShell script can be seen in figure 5.
Peering into the Communication
The request for the Emotet trojan is downloaded using PowerShell, which creates an unusual network traffic artifact. PowerShell, by default, issues the subsequent HTTP request utilizing minimal HTTP headers, something that stands out from normal web-traffic generated by a web browser. This is due, in part, to the browsers tendency to be much more verbose in the number of HTTP request headers used.
Since the request is for an executable file, this also presents an opportunity to correlate potentially suspicious behavior from a host.
A significant number of dropper hosts that were observed were not utilizing TLS nor obfuscated/encrypting the returned PE file. The popular Emerging Threats open rule set can identify this type of traffic, although requesting and receiving a PE file is in and of itself not suspicious. Other events, such as the presence of minimal HTTP headers, can be used to build context around a potentially malicious event and elevate network traffic to a level worth investigating.
As previously alluded to, I developed a Python script that can identify the obfuscated base64 string, the padding value, decode the payload and extract the download URLs from the PowerShell script. Figure 9 represents sample output from samples analyzed during this work.
Observing Changes in Communication
Once the Emotet binary is executed on the host, the pattern of communication changes. An infection begins with an initial check-in which includes information about the host. This information is encrypted and encoded before transmission, making it more difficult to detect evidence of data exfiltration in the network traffic. This information is sent to any number of different hosts within Emotet’s C2 infrastructure – commonly referred to as epochs. Figure 10 shows the initial check-in.
In contrast to the HTTP request made to retrieve the Emotet trojan, the check-in attempts to blend in better with normal network traffic. To begin, the URI structure utilizes less randomly generated string values and relies instead on dictionary words. There is an HTTP referer header, even though the request was not initiated by a user clicking in their browser. The request is made directly to an IP address. Non-standard ports for HTTP traffic are regularly used as well as mixing common ports with non-common protocols, such as non-HTTPS traffic over port 443. Finally, there is the use of a common user-agent string. All of these characteristics can be observed in figures 10 and 11.
The second stage C2 nodes remain more consistent and are therefore slightly easier to track. Projects like Abuse.ch  provide aggregated data of all known and active Emotet C2, along with IDS rulesets that can be used to enrich malware analysis.
Several malware families rely on configuration information, which is extracted in memory during runtime. Capturing the memory of the Emotet process, or the entire virtual machine, configuration information can be extracted using a combination of tools such as Volatility  and MalConfScan . As can be seen in figure 13, the RSA key used to encrypt communication along with pre-configured C2 IP addresses/ports can be extracted by these tools.
This information can be utilized to correlate samples under analysis with other published information. The Cryptolaemus group  posts daily Emotet activity as a result of their tracking. In the sample used for this blog post, we were able to extract the RSA public key in addition to a series of C2 endpoints. As can be seen in figure 14, the extracted RSA public key matches the one published by the Cryptolaemus group for epoch 3.
MD5 for Figure 3: 1f21fd47803e3fb295c5e551225623c5
MD5 for Figure 4 and 5: 1c8afbe17fd7844228f9936b0cb03f25