In this post, we will be looking into ways to identify and analyze the presence of a user form in an office document. As I discussed in a previous post, user forms are often used to store resources needed by the malware author such as scripts (PowerShell, VBS), shellcode and strings. We will be using OLEDUMP to assist in our analysis and by the end of this post, you will be able to identify and trace the usage of user forms and their objects throughout macro code. For this analysis, we will be looking at the following malicious office document.
Identifying User Forms with OLEDUMP
Identifying user forms in an office document is not as straight forward as you may think. Running oledump on our sample, we can see no obvious evidence that a user form is present – although we can clearly see evidence of macro streams at indexes 8, 9 and 12.
I started to notice that office documents that contained a user form also contained streams that ended with ‘f’ and ‘o’. In this example, these streams are at index 17 and 18. One way to confirm this document has a user form is to open it using Microsoft Office and investigate the associated VBA project using the integrated IDE. In the following screenshot, you can see a form named discord in the project view.
Tracing the Use of a User Form
If you encountered the following snippet of VBA from stream 8, it would be (at least initially) difficult to trace where the discord variable is defined.
This is due largely to the fact that it is not defined anywhere in the macro streams. We can see that it is likely an important object, as the value from it is assigned to periapt, which is later used in a sub-string method Right. This value is then assigned to failed and passed as an argument to the function apocope in the cowkeeper stream. Fortunately, we were able to identify the name of the user form in the IDE and can now associate any usage of this object to the user form itself. The user form will contain other objects to store the content needed by the code – these will come in a variety of objects such as text boxes, frames, tabs, et cetera. In our sample code, the ControlTipText property of the playbill object is being accessed. You can further explore these objects in the Office IDE.
What’s in a User Form?
It won’t always be easy to recognize the content in a user form, even if you can access the content directly in the IDE. Often times this content is obfuscated, which may include encoding or encryption. In addition, after deobfuscation it may represent binary content such as shellcode and require binary analysis tools such as a disassembler. That is actually the case for this document. While we won’t go into exhaustive analysis, we can finish tracing the macros to get a high-level understanding of what this document was up to.
At this point we know that content from the user form was passed to the apocope function. This returns a value which is assigned to the variable pristis. Apocope contains enough code that we don’t necessarily want to analyze it at this point. Instead, we can trace how the variable pristis is used and see if we deduce it’s purpose. Later in the code, pristis is assigned to meetinghouse, which is then passed to the function foam.
Foam is interesting in that it contains calls to the Windows API, which is another aspect of this document. Functions from the Windows API can be defined and reference through a function pointer. For the function foam, you’ll see that it calls both VirtualAllocEx and RtlMoveMemory through the function pointers betterment and antecedency.
We can also trace the argument to this function, which is the content from the user form, and see that it is eventually used as an argument to antecedency. Based on this analysis so far, we know that the macros allocate memory through a call to VirtualAllocEx and then copy the content from the user form using a call to RtlMoveMemory. In addition, we can now suspect that the call to apocope was to deobfuscate the content from the user form and do not need to analyze its functionality.
To continue our analysis, we can trace the return value from the function foam, which is assigned to the variable bayberry. Bayberry is later used to concatenate a hard-coded value from the variable anklet and assigned to a new variable aprum.
Next, aprum is passed to the function cabriolet. If you search for this function in the macro code, you will see that it is defined as a function pointer to EnumDateFormatsW.
But why a call to EnumDateFormatsW? At this point in our analysis, we suspect that aprum points to memory that was recently allocated and contains content from our user form. If this happens to be shellcode, then it would make sense that the macros are now trying to execute this code. You can reference the function EnumDatesFormatW on MSDN and see that the first argument is in fact a function pointer! Please note the use of EnumDateFormatsA instead of EnumDateFormatsW as documentation for the latter could not be found online. These functions are equivalent, Microsoft’s convention is to end functions with an ‘A’ if they expect ASCII strings and end with a ‘W’ if they expect wide-character strings.
Confirming with Dynamic Analysis
We can use the debugger in the Office IDE to help finish tracing this functionality. By setting a breakpoint on the call to cabriolet (i.e. EnumDateFormatsW), we can investigate the value of the first variable – aprum. This value is the location of the call back function for EnumDateFormatsW and should be the address for the beginning of the shellcode.
The Office IDE allows you to set watches on variables, this provides a window that tracks the value of the variable during execution. You can also hover over the variable to see the value in a pop-up dialog. In either case, the value will be presented as a base 10 value. You can use a calculator to convert this value to base 16, or hex. This will make identifying the appropriate location in virtual memory easier.
Once the virtual address is known, you can view the contents of memory using a tool like Process Hacker 2 (PH2). PH2 allows you to see the memory breakdown of the processes running in the host and, in this case, allows us to specifically investigate the allocation at 0x70D0E5D. Keep in mind that this is not the base of the allocation but rather at an offset further into memory. You will likely find the new allocation with the lower 2-byte values being zero.
You can now extract this shellcode from memory and begin disassembly at an offset of +0xE5D. Viewing this content in memory you’ll see that at this offset if the hex value 55, which is the opcode for the instruction push ebp. This instruction is commonly found in the beginning of a function prologue and helps to confirm that our analysis is on the right track!
In this post, we looked into ways in which you can identify and trace the use of userforms in a malicious office document. We then connected the use of the form directly to the macro code. This also allowed us to briefly explore how the Windows API can be used directly in VBA. Finally, we determined that this document was used to stage shellcode into memory and begin it’s execution, requiring that we perform disassembly of this code to continue our analysis.