Pycel: Compiling Excel spreadsheets to Python and making pretty pictures

Background

As described in my previous post, my job is to assemble a software suite that is used by aerospace engineers to design civillian UAVs.  Our main use case being search and rescue.

Every aircraft design starts its life in a concept tool where the configuration choice (V-tail, canard, flying wing, …) is made and where the principal geometrical parameters (wing area, CoG, tail volume, …)  are decided.  This then allows the calculation of the approximate weight and performance of the aircraft (e.g., maximum speed, range, endurance, …).

While this concept stage is only a rough approximation it already involves (in our case) some 250 variables which are interlinked in a complex web of equations.  These sizing equations capture a great deal of knowledge about the physics of flying an aircraft as well as empirical relationships drawn from historical & experimental data.  Solving these equations is really a Multidisciplinary Design Optimization (MDO) problem, a whole field of research in its own right.

After working with the Pacelab Suite for a year we have (for various reasons) switched to spreadsheets for implementing & experimenting with the sizing equations.   This brings great flexibility for the designer but makes things more tricky to parallelize & optimize for me.  For example, I cant just throw Excel on the 8000+ core linux cluster we have here.  Also, while the inbuilt solver plugin is easy and quick for solving small scale systems.  As our equation system got more complex the process just breaks down.

It was also clear that our sizing complexity had outgrown the spreadsheet UI and had become hard to understand for a newcomer.

Early experiments

Being a programmer I feel more at home manipulating code than manipulating spreadsheets.   Early on I had already experimented with visualizing the sizing logic by customizing and extending the Trace addin.  This resulted in images like this:

With some more graphviz magic you could add some structure by grouping related parameters:

And trace the impact of a particular parameter:

However, I soon ran into limitations of the static graphviz model and VBA (the language Trace is written in) is not the most happy place to be (to put it politely).

Python to the rescue

This work then lay dormant as I was sidetracked by other stuff.  However, with Pacelab being abandoned and the spreadsheet itch still being there I could not resist having a go at rewriting my own excel visualization library and throwing in some compilation for good measure.  The goal being to:

  • Allow better visualization and exploration of the sizing equations, to understand how variables influence each other
  • Compile the spreadsheet into code that can be manipulated, transformed and executed independently from excel.

There are already a number of commercial tools for compiling spreadsheets but I did not find an open source library which did what I needed.  And anyways, I did not want to look too hard since it would be something fun to do myself.  Why python? Its the language I was using at the time and one I particularly enjoy.

Piecing things together

A core problem in making all of this work is tokenizing an excel formula properly. Luckily this had already been solved and after a small patch to the python port that was working. Throw in shunting yard, the conversion to an AST, and I was ready to emit code.  For example:

Formula:  =3 + 4 * 2 / ( 1 - 5 ) ^ 2 ^ 3
RPN:  3|4|2|*|1|5|-|2|^|3|^|/|+
Python code:  3+((4*2)/(((1-5)**2)**3))

or

Formula:  =If(SUM(A1:A5) > 0, sin(0.3), ln(2.5))
RPN:  A1:A5|SUM|0|>|0.3|sin|2.5|ln|If
Python code:  sin(0.3) if xsum(eval_range("A1:A5"))>0 else xlog(2.5)

As you can see this means mapping excel functions/operators onto python ones and adding python equivalents where necessary.

Dependency tracking

Besides the issue of function mapping there is the issue of dependency tracking when updating cells.  So if A1 contains the formula B1 + 5 then you need to be aware of this dependency and update A1 whenever B1 changes.  For this I used a graph as the base data model as it is most intuitive.  Thanks to pythons networkx this was very easy to do.

The final result is a small python library that:

  • takes an excel spreadsheet & one or more seed cells
  • follows the dependency graph, starting from the seeds
  • generates a graph where each node is a cell or range and contains the python equivalent of the excel formula

This graph can then be serialized to disk and run independently of Excel (e.g., on linux).  Caching and lazy evaluation ensures the calculation and update process is relatively fast.  I have tested extensively with a spreadsheet with multiple sheets and over 1000 formulae.  Full calculation times are around 50ms which is plenty fast for our application.  Also, I can now run multiple instances in parallel, something that was virtually impossible to do by orchestrating Excel via Pthon over COM (yes I know there is solution from Microsoft for running Excel on a cluster but I did just not want to go there).

Pretty pictures

This means its now time for an intermezzo with some pretty pictures, courtesy of Gephi. The node color is determined by the worksheet the node comes from and the size of a node is determined by how many cells it depends on.  Ever think an aircraft design problem could look pretty? 🙂

Using some layout algorithms you can start pulling things apart:

And look at the macroscopic structure:

Or zoom in for the microscopic structure:

Here the colors represent which variable group the node belongs to. Stripping out things further you can look at one particular variable, for example the wing area, and see how it influences others:

Finally, optimization

Ok, so I could now serialize an excel spreadsheet to a python graph.  However, the ultimate goal is to build an aircraft that can actually fly, or in aero speak, it is balanced (lift equals the weight, center of gravity is in the right place, etc.)  This still involves an optimization process which now had to be done in Python.  For this I added an extra layer on top of the compilation process using OpenOpt.  After some bugfixing and testing I am happy to report that, without doing anything fancy, it solves in 98% of the time and takes 30 seconds to 1 minute to solve.  Compared with the original 65% and 10 minutes.

As a sanity check I then did some parameter sweeps to generate surface plots such as these:

Roughly this plot tells you that as you increase the payload weight and the importance of a low landing speed, your dry weight (the weight without fuel) goes up.  Which is exactly as you would expect (ignore the ridge at the bottom left).

Where is the code?

My employer has been kind enough to let me release the code which is now available on github!  An example is included and the readme lists the pro’s and con’s of the current functionality.  May the patches roll in 🙂

Thoughts and Future work

So far this has been an interesting little side project and been very useful in the wider design system.  Recall from my previous post that this is just one small piece in the puzzle.  In the wider system this is linked up with CFD simulations, costing codes, an operational model and what have you.  One disadvantage of spreadsheets, though, is that designers tend to use large tables to interpolate out equations instead of writing a few lines of code.  This can add large numbers of nodes/edges to the graph, thus increasing the visual complexity, without adding much to the true problem complexity.

I think there is potential to explore this topic further, but this would really need an application specific layer on top of a graph-only tool like gephi.  However, it does not look like this is where my priorities will lay, but maybe it will inspire somebody else 🙂

There is a good chance I will be revisiting this type of work when we look at visualizing design rationale data and how it evolves during the aircraft design process.  But that is a separate topic for a separate post 🙂

Thoughts, comments, suggestions, constructive criticism, patches? Dont be shy!

–Dirk

11 thoughts on “Pycel: Compiling Excel spreadsheets to Python and making pretty pictures

  1. Pingback: Pycel: Compiling Excel spreadsheets to Python and making pretty … | Linux Affinity

  2. Pingback: Guides, Papers and APIs – Part 2 | The Naked Technologist

  3. Pingback: “Click here” to design and build your UAV | Dirk's Page

  4. FYI, I’m working on an equivalent to the excelwrapper.py file that allows OpenOffice/LibreOffice and Uno to do the same job that Office and COM are doing in the original version. I have the supplied example working correctly, but haven’t yet thrown any good-sized spreadsheets at it. Feel free to contact me if you’re interested in this effort, at pyceloo -at- digitalfish -dot- com.

    • Hi Steve,

      Your email address does not work apparently. Anyways, good stuff! Happy to integrate any improvements. Haven’t worked on the code for a while as our internal version does what it needs. We are putting together a project proposal around this work that, if all approved, will take it to a much higher level.

      Cheers
      Dirk

  5. Dear Dirk,
    I tried to use pycel as an addin and I found out that there is one line left in “excelcompiler.py” (line 1) from import excellib. I deleted it because Pyxll gave me an error in the log (invalid syntax). After that I get the addin loaded but once I try to compile one cell I get the following compilation error:

    —————————
    PyXLL Error
    —————————
    Error calling function for menu item Compile selection

    [global name ‘excellib’ is not defined]
    —————————
    OK
    —————————

    Can you suggest me a quick workaround for this (I’m not an expert programmer).
    Thanks a lot for the help and for this very interesting project

      • Thank you for the reply, maybe I didn’t explain the problem very well. It was not related with from (I corrected it the first time I launched the plugin as I recognized that it was an error) but with the fact that pyxll give me this error:
        —————————
        PyXLL Error
        —————————
        Error calling function for menu item Compile selection

        [global name ‘excellib’ is not defined]
        —————————
        OK
        ————————
        I think it depends on how the variable is declared in the code (local/global) but I don’t know how it can be possible to solve this. Anyway thanks a lot for the help

      • Could you open an issue on the github page as its a better place to discuss such things. Make sure you post the full stack trace from the pycel stacktrace.
        Cheers
        Dirk

      • Hi Dirk,
        I solved the problem (it was very simple I only deleted one row), I posted the solution in the github issues page. I would to ask one more thing, I would like to use the compiled code (.pickle) with openopt to solve some nonlinear optimization problems (I used only pulp and cvxopt for LP, MIP and SOCP up to now in Python), can you create a small example? It will be very helpful. In any case thanks a lot

      • Again, please dont post such questions as comments but as issues on github. Essentially you need a function handle for your objective function and constraints. Each function would look something like this:

        def my_fun(X):
        # X is your solution vector
        for x in X:
        # set the values in the spreadsheet
        self.set_value(“”, x)
        # calculate the target cell
        try:
        res = self.evaluate(“”)
        except Exception as e:
        # do something in case of error
        res = float(‘nan’)

        return res

        You then pass this function handle to openopt.

Leave a comment