Ben Stroud wrote: > George Sakkis wrote: > >> After a brief search, I didn't find any python package related to OLAP >> and pivot tables. Did I miss anything ? To be more precise, I'm not so >> interested in a full-blown OLAP server with an RDBMS backend, but >> rather a pythonic API for constructing datacubes in memory, slicing and >> dicing them, drilling down or up dimensions and exposing them in some >> suitable form to a presentation layer. I've hacked a first cut of a >> pivot table implementation and an XHTML generator that produces >> hierarchical html tables but it's not particularly general or easily >> extensible so far. Is there any interest at all on a pythonic version >> of something like JOLAP or XMLA ? >> > I'd be interested as well. I posted a similar question to the ruby > mailing list a few months ago to no avail. Ideally, someone much more > talented than myself would create a open OLAP library in C that could be > interfaced with dynamic languages easily (I ordered some OLAP books and > started in on this, and decided I was in over my head for now). As far > as free software, all I've been able to find is java-based Mondrian. > Maybe it could serve as a reference implementation for someone.
The NetEpi Analysis project - see http://sourceforge.net/projects/netepi , although not strictly an OLAP or datacube engine, might offer some of the things you are looking for. It is intended for exploratory epidemiological analysis of (potentially large) health-related datasets, but should work with most types of data for which an OLAP engine would be useful. Underneath there is a vertically-disaggregated, ordinally-mapped, set-theoretic data selection and summarisation engine, which is a pompous way of saying that it holds data column-wise in memory-mapped Numpy (Numeric Python) arrays, and uses some fast (custom-written) set functions on inverted indexes on the ordinal positions of column values to select and summarise data (entirely at run-time, cf most OLAP engines, which rely on a degree of pre-summarisation along pre-chosen dimensions). It is all Python and thus has a Python(ic) API, including an SQL-like WHERE clause parser for data selection (OK, SQL is not Pythonic, but that's just for data subsetting). It includes quite a few statistical functions and nice graphics courtesy of R (http://www.r-project.org) (which is embedded via RPy - http://rpy.sourceforge.net/). Full support for missing values and weighted datasets is provided (but not full support for survey data with complex sample designs - that's forthcoming). Currently it works well with datasets in the 5-10 million row range, but the basic design lends itself easily to parallelisation if you have bigger datasets, and preliminary work indicates good speed improvements - something we want to pursue given all these multi-core CPUs which are now available at reasonable cost. Be warned that NetEpi Analysis is currently only of beta quality, and is a bit of a pig to install, on Linux/Unix/Mac OS X only at present. We hope to be able to ready a production-ready Version 1.0 by the end of 2006, possibly with MS-Windows support as well. However, the core data summarisation/subsetting engine is thought to be sound (and there are some unit tests to attest to that). Probably not quite what you were after but I thought it worth a mention. Please post follow-ups, if any, to the NetEpi mailing list: http://sourceforge.net/mail/?group_id=123700 Tim C > > Cheers, > Ben -- http://mail.python.org/mailman/listinfo/python-list