One of my first blogs was about de-compiling Excel Formula code from the formula statements in the workbook stream. Today we’ll see how we can run this code in Python.
First of all, we need to logically build-up the code the formula-entries which are describe in the Workbook stream. To understand each parsed-expression I refer you back to the first blog. Excel Formula can run both parsed expressions and strings stored in cells as text (which will be parsed expressions in Excel), so we need to establish a common format. I choose the textual representation of the parsed expressions, as this allows me to also run text-lines generated by the code later.
This means all the parsed expressions in the previous blog can be translated to:
"A2","=DEFINE.NAME ('x_b2w', 0)" "A3","=WHILE (x_b2w < 49)" "A4","=DEFINE.NAME ('x_l5', -1.0)" "A5","=DEFINE.NAME ('x_b2w', x_b2w + 1)" "A6","=WHILE (x_l5 < 22)" "A7","=DEFINE.NAME ('x_l5', x_l5 + 1)" "A8","=INDIRECT (ADDRESS (x_l5 + 1, 38 + x_b2w)) = 'koveowvnb'" "A9","=B45 & INDIRECT (ADDRESS (x_l5 + 1, 38 + x_b2w))" "A10","=IF (A8, DEFINE.NAME ('x_l5', 24), SET.VALUE (B45, A9))" "A11","=NEXT ()" "A12","=FORMULA (B45, ABSREF ('R[' & x_b2w & ']C', A73))" "A13","=SET.VALUE (B45, '')" "A14","=NEXT ()" "A203","=HALT ()"
So here we see the cell (A2, A3, A4..) and the textual representation of the parsed expressions. When you parse the parsed expressions you build a stack which allows you to build the logical flow. Challenge here is PtgFunc, which unlike PtgFuncVar – does not tell you how many parameters this function require. You need to look at Microsoft documentation and that causes some issues.
Here is one example on how to get the textual representation of the parsed expression.
row 9, col 0, ifxe 15, FormulaValue=02 01 1D 00 00 00 fExpr0=FFFF flags=0000 44 07 00 00 C0 PtgRef: loc col=0, row=7, value=EMPTY 19 02 12 00 PtgAttrIf: 0012 17 04 00 78 5F 6C 35 PtgStr: x_l5 1E 18 00 PtgInt: 24 42 02 3D 80 PtgFuncVar: DEFINE.NAME, param=2, tab=61, fCeFunc=1 19 08 14 00 PtgAttrGoto: 0014 24 2C 00 01 C0 PtgRef: loc col=1, row=44, value=EMPTY 44 08 00 00 C0 PtgRef: loc col=0, row=8, value=EMPTY 41 6C 00 PtgFunc: SET.VALUE (108) 19 08 03 00 PtgAttrGoto: 0003 42 03 01 00 PtgFuncVar: IF, param=3, tab=1, fCeFunc=0
You start from the end (I ignore PtgAttrx statements):
- PtgFuncVar with 3 parameters: IF (x,x,x)
- Param1: PtgFunc SET.VALUE – takes 2 parameters
- Param 1: PtgRef location col=0, row=7
- Param 2:PtgRef location col=1, row=44
- Param 2: PtgFuncVar DEFINE.NAME (2 param)
- Param 1: PgtInt 24
- Param 2: PtgStr: x_15
- Param 3: PtgRef location col=0, row=7
- Param1: PtgFunc SET.VALUE – takes 2 parameters
When you map this together you get:
IF (A8, DEFINE.NAME ('x_l5', 24), SET.VALUE (B45, A9)) IF (Param3, Param2, Param1)
What is the code really doing?
- It sets up a few variables
- Creates a while loop while x_b2w < 49 – this is the columns it wants to read from (AM1 first)
- Sets x_l5 to -1, adds the while_loop counter by 1
- Now it enters a loop (x_l5 < 22) – this is the number of rows for each column it wants to get values from
- It now sets A8 to be a boolean state if the content of the cell has the value ‘koveowvnb’
- It fetches the string from the same cell + the original content from B45 (concatenation).
- If A8 (boolean state, done finding cell values) – it sets the while variable above 22.
- If not, it sets the global string for this line, B45, to the value
- When the string is done, it uses the Formula command to insert the code of this string to the spreadsheet
If we manually rewrite this code to Python – what would it look like?
B45 = "" # needed as XF use it to append data to it A73 = 0 # used by Formula statement, not initialized to any value x_b2w = 0 while ( x_b2w < 49 ): x_l5 = -1.0 x_b2w = x_b2w + 1 while (x_l5 < 22): x_l5 = x_l5 + 1 A8 = INDIRECT (ADDRESS (x_l5 + 1, 38 + x_b2w)) == 'koveowvnb' A9 = str(B45) + INDIRECT (ADDRESS (x_l5 + 1, 38 + x_b2w)) if A8: x_l5=24 else: B45=A9 FORMULA (B45, ABSREF ('R[' + str(x_b2w) + ']C', A73)) B45 = ''
That doesn’t look to hard. Some interesting challenges doing this automatically will be Integer to Str conversion and really following what variables are defined before referenced. Maybe I’ll find another way to name them.
Next step is we need to have all the cell-information so the INDIRECT call can find the content behind each cell. To do that xf2python dumps a CSV with all the cells:
The other calls to make are ABSREF, ADDRESS and FORMULA.
When you run this code it produces the following output:
You can see from the cellinfo that the data isn’t encrypted at this stage, it just concatenates strings in the various columns to write them to the sheet. As you can now see, the next step now is to continue running between A14 (NEXT) and A203 (HALT). That will be the code above, so the python interpreter needs to be flexible. The code here has encrypted commands, which are stored in the C1 to C18 cells.
This will be fun to write! Stay tuned.