John Machin wrote: > On Apr 2, 1:59 am, Duncan Smith <[EMAIL PROTECTED]> wrote: > >>Hello, >> I am currently implementing (mainly in Python) 'models' that come >>to me as Excel spreadsheets, with little additional information. I am >>expected to use these models in a web application. Some contain many >>worksheets and various macros. >> >>What I'd like to do is extract the data and business logic so that I can >>figure out exactly what these models actually do and code it up. An >>obvious (I think) idea is to generate an acyclic graph of the cell >>dependencies so that I can identify which cells contain only data (no >>parents) and those that depend on other cells. If I could also extract >>the relationships (functions), then I could feasibly produce something >>in pure Python that would mirror the functionality of the original >>spreadsheet (using e.g. Matplotlib for plots and more reliable RNGs / >>statistical functions). > > > There are two things you have to consider that can create those > dependencies (1) formulas (2) macros. If it were only formulas, you > wouldn't need to generate an acyclic graph. A no-parents cell will be > defined by a type-specific record (e.g. NUMBER). Dependent cells are > defined by a FORMULA etc record. Similar story with the XML format in > Office 2007. >
Well yes, partitioning the cells according to format would be start. Now you have me wondering about the new XML format. Maybe that would be easier to deal with? > I'm not aware of any way to use Python to access the content of > macros. It may be possible using COM ... > I suppose I can do that manually. Most of them seem to do with presentation rather than the business logic. > I'm a bit puzzled by your notion of creating a dependency graph > *without* first extracting the "relationships (functions)" [which you > could do only by parsing the formulas and macros]. > Incomplete parsing :-). I thought I might be able to pull out the dependencies relatively easily, then go back to the spreadsheet to examine the formulae. > "pure Python that would mirror the functionality of the original > spreadsheet": mammoth job; speed? > Basically to try and separate the data, business logic and presentation for coding up in Python. Particularly, I don't want to rely on Excel's statistical functions. But it's also possible that other similar models will be making there way to me, and I'd rather have a few simple classes to handle the business logic, and have model parameters etc. tucked away in text files. > Before you do that, have a look at (1) xlrd docs (2) xlrd source (3) > Openoffice.org docs of XLS format (4) MS docs (now an ECMA standard) > of Office 2007 XML-based files. > Yes, will do. > >>The final application will be running on a Linux server, but I can use a >>Windows box (i.e. win32all) for processing the spreadsheets (hopefully >>not manually). Any advice on the feasibility of this, and how I might >>achieve it would be appreciated. >> >>I assume there are plenty of people who have a better knowledge of e.g. >>COM than I do. I suppose an alternative would be to convert to Open >>Office and use PyUNO, but I have no experience with PyUNO and am not >>sure how much more reliable the statistical functions of Open Office >>are. At the end of the day, the business logic will not generally be >>complex, it's extracting it from the spreadsheet that's awkward. Any >>advice appreciated. TIA. Cheers. >> >>Duncan > > > PyUNO: Google around a bit. My experience was: tried it, got lost in > the jungle of its documentation. > Yep. Me too. I might have another go at it. Cheers (to all who replied). Duncan -- http://mail.python.org/mailman/listinfo/python-list