This time I want to show you a nice method of making the obfuscation-level of office Visual Basic for Applications (VBA) macros go away. You’ll get to see the clear motivation and instructions the malware is giving without dealing with the garbage they show to the unlucky person trying to figure out what they do.
As always, when we do a task to learn something it’s good to have a sample to follow so you can replicate this on your own system, Today we’ll be following a sample f5858eb5772eba0b6c066aebdd1efbdefed71a6a. It should not be any surprise that there are macros there.
After you extract the macros themselves (e.g. use olevba) you’ll see the whole obfuscated source-tree that represents the macros. You’ll see a “Sub AutoOpen” and a “Sub Workbook_Open” – which both calls “AddSpace” which again calls “RemoveParagraph”. So if we look at “RemoveParagraph” it looks like this:
Maybe your eyes are better than mine, but I struggle to read in clear-text what they will do. Here you can see the use of a decryption function “sss” being call extensively to decode many layers for each string. How can we find out what this is, fast – without having to open a real Virtual Machine with Windows and Office (and their licenses), run it, log it all and analyse the output? That’s the challenge of today.
Macros in VBA uses is written in a language. Each computer programming language have rules. The VBA language is not that different than other languages. When you run it, it’s compiled – but can we run this source code easily without Office? To understand if we can, we need to understand the code and how to interpret it.
Let’s walk through the essence of the sse() function to get a feeling for how VBA writes code to accomplish their tasks:
In this function-declaration, we see a function being named “sss” with a parameter “sString” defined as the type String. This function will return a data-type String as well.
With VBA it seems variables are quite strict in their declaration. Here we see the following local variables being declared:
bOut and bIn as type Byte bTrans as an array of Bytes (table of 255 entries) OOOPOOOOPOOOO6, OOOPOOOOPOOOO12, OOOPOOOOPOOOO18 as an array of Long (table of 63 entries) lQuad as a Long, iPad as an Integer, BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB as a Long, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA as a Long, sOut as a String and finally mnAjUYt as a Long.
So this sets up the scene of what support it needs for its variables to accomplish the task, whatever that is. Now it wants to start to run some code:
sString = Replace(sString, vbCr, vbNullString) sString = Replace(sString, vbLf, vbNullString)
The variable sString will receive the result of the function Replace which takes as input the original sString, a VBA constant vbCr and another constant, vbNullString. I assume it wants to replace all CarriageReturns and LineFeeds with nothing.
Now it wants to build the bTrans table:
mnAjUYt = Len(sString) Mod 4 If InStrRev(sString, "==") Then iPad = 2 ElseIf InStrRev(sString, "=") Then iPad = 1 End If For mnAjUYt = 0 To 255 Select Case mnAjUYt Case 65 To 90 bTrans(mnAjUYt) = mnAjUYt - 65 Case 97 To 122 bTrans(mnAjUYt) = mnAjUYt - 71 Case 48 To 57 bTrans(mnAjUYt) = mnAjUYt + 4 Case 43 bTrans(mnAjUYt) = 62 Case 47 bTrans(mnAjUYt) = 63 End Select Next mnAjUYt
If I follow the logic, VBA as a language isn’t that different from – say Python? So what would this function look like in the Python world?: Let’s have a try at converting the previous code to Python:
mnAjUYt = len(sString) % 4 if sString.rfind("==") != -1: iPad = 2 elif sString.rfind("=") != -1: iPad = 1 for mnAjUYt in range(0,255): if mnAjUYt>=65 and mnAjUYt<=90: bTrans[mnAjUYt] = mnAjUYt - 65 if mnAjUYt>=97 and mnAjUYt<=122: bTrans[mnAjUYt] = mnAjUYt - 71 if mnAjUYt>=48 and mnAjUYt<=57: bTrans[mnAjUYt] = mnAjUYt + 4 if mnAjUYt == 43: bTrans[mnAjUYt] = 62 if mnAjUYt == 47: bTrans[mnAjUYt] = 63
That isn’t so much different. So, could we convert the entire VBA source-code to Python? Why not give that a try.
Notice some differences in the languages:
- The For loop has be defined differently for a Python syntax
- Case needs to be rewritten to a if (with a range for some)
- VBA uses variable(index) but Python use variable[index]
- Python’s infamous ‘:’ to be inserted at the end
- Python’s indentation rules needs to be followed
- Python needs to use bytearray() as type to store the binary values in an array.
- The operator divide ‘/’ returns a float/double and not an int.
- Functions are called with parameters using ( and ).
All in all, it’s not that difficult to convert a VBA macro code to Python code that will run natively on your box and could point out the behavior hidden underneath the obfuscation.
Python code that executes the same logic:
Ok, so translating VBA to Python can be done by a human, and following the same rules, it can be done by a program. Isn’t that why we have computers?
Once you understand the syntax and operation, you can easily move this logic programmatically to e.g. Python or any other language you prefer.
There are some interesting challenges, it seems VBA and their arrays give a bit more “legroom” (or rely on a buffer-overflow as standard) whereas Python, if you access one over the last member of the array it throws an exception. So, in the Python conversion you need to leave some more space.
Adding layers to look for eternal loops is also something that needs to be added. You don’t want to spent a microsecond more here than you have to.
The Office VBA environment
What about the environment? VBA code calls some functions which it resolves to the Office app on top of the operating system. Well, create them in your desired language. Once. The script language will already have some of them, others you’d have to write. Have them as a library you always import in the sample at line 1. It needs to represent the VBA world for the sample, now running in Python or your language of choice. Python’s object module fits beautiful in this world too, which allows you to reuse your “VBA runtime” support.
A bit specific to this one, but you get the gist of the support needed to support the VBA world of this sample. All we want is the behaviour, not any modification on our system. (NB! Of course fetching the variable should depend on what you want etc, but it’s a PoC).
So for instance, the Shell API won’t actually run the shell – it will just tell that this would happen.
So at the end you have a Python code that doesn’t do anything on your system, and you can just let it run like this and sit back and relax:
Now I skipped the part where the actually extracts the binary from the Document resources, but that could easily be added too (VBA object would need to access the document stream). Then you’d have the executable file too decrypted, ready for analysis. And this generated script can run on any platform with Python, no Windendo (as my brother calls Windows) is required.
Could other script-languages be converted into other types of languages for ease of understanding the content outside their natural environment? Interesting idea. Let me get back to you on that one.
Next time we’ll look through a very recent sample using multiple documents, forms and other fun things – how can we run this as Pyhthon! Stay tuned!
I believe malware researchers should have automatic tools like this to get the real context of the samples they need to analyze. I also believe any gateway out there should have automatic tools like this to easily extract context and threat intelligence and be able to block/alert based on the real content. This would add to the quality of the decisions.
If you need help, contact me at LibNotFound!