Excel Formula, or XLM – doesn’t stop giving pain to researchers?
On Friday I got a new sample using the xlsb file-format that supposedly was having malicious code. I had a quick look, and wow – this was different. My first check on VirusTotal (VT) showed me that it hadn’t been uploaded to VT yet. So with nothing to go on, I started looking into the sample.
Structurally it’s a Microsoft Excel 2007+ document containing (ZIP) the following files:
So naturally we look at the xl/macrosheets/sheet1.bin right? Ok, first we need to enumerate these records. The xl/macrosheets/sheet1.bin looks like this:
So how are the records stored? The answer is in Microsoft’s documentation. To establish the recordId, you read the first byte (0x81). Since the high-bit (0x80) is set, this means there is another byte to add to the recordId. Remove this bit for now and we get 0x01. Next byte is 0x01 and as the high-bit (0x80) isn’t set, this means we can use the value of the byte multiplied with 0x80. This means that the recordId is (1*128)+1 = 129- which is BrtBeginSheet. To get the length you do the same, read the next byte (0x00) which means there is no high-bit (0x80), so there is no other byte – and the rest of the 7 bits say 0, so the record has no data.
The next record is BrtWsProp with recordId 147 and length 23.
recordId: (0x93 & 0x7F) + (0x01*0x80) = 147 (0x93) length: (0x17 & 7x7F) = 23 (0x17)
Now you parse all the records and you get a nice list. Unfortunately while parsing the records of the xl/macrosheets/sheet1.bin I see nothing weird.
On to the other sheets then, what can we find here? Quite a few ones actually. The ones you are interested in, for now – while we learn is:
|0||BrtRowHdr||Tells you what row you currently are on|
|8||BrtFmlaString||Tells you about an embedded string and the pcode (parsed expression) to build this string|
|11||BrtFmlaError||Tells you the pcode (parsed-expression)|
Let’s have a brief look at the data we need.
Microsoft have documented this well in their PDF. To start with it contains an 8 byte cell information structure, a variable XLWideString (which looks like a Unicode string), 2 bytes of grbitFlags and then you get to the formula itself (CellParsedFormula structure).
The first one you’ll find is this:
which after decoding looks like this:
RECORD: BrtFmlaString (Id 8,offset 58d), LENGTH: 30 col: 26, row: 20 | strlen=1 : "/" 1E 2F 00 PtgInt: 47 41 6F 00 PtgFunc: CHAR (111)
The record has no information about the row, so you need to get this from the BrtRowHdr record.When you get to the CellParsedFormula structure you parse it like my previous article mentioned).
This record also starts with a 8 byte cell structure, then a one-byte fErr, 2 bytes grbitFlags before you get to the formula itself (CellParsedFormula structure).
When you parse the first record of this stream you’ll get:
RECORD: BrtFmlaError (Id 11,offset 1e3), LENGTH: 62 49 27 00 PtgMemFunc: 27 19 40 00 01 PtgAttrSpace: 0100 23 04 00 00 00 PtgName: index 4 23 14 00 00 00 PtgName: index 20 0F PtgIsect: 23 5D 00 00 00 PtgName: index 93 0F PtgIsect: 23 46 00 00 00 PtgName: index 70 0F PtgIsect: 23 15 00 00 00 PtgName: index 21 0F PtgIsect: 23 2F 00 00 00 PtgName: index 47 0F PtgIsect: 13 PtgUminus:
This is just a simple structure, but for now we just want the row.
First DWORD gives you the sequence you need (in this case, 2).
At the end, when you have parsed all these records from all these binary worksheets, you’ll end up with a virtual sheet that looks like this:
That is more informative, but it was a bit of work to get there. At least that is context you can relate to.
Now when I write the article I see VT indeed has received a copy, and when the sample was first checked (on entry) a single engine detecting it:
Kudos to Ikarus! I think my weekend project is over. I have a mental challenge when I have a problem: I can’t let it go until it’s solved, and now I can finally relax. Let me know if you need help! I think tools should give this kind of context automagically.