On Sep 22, 4:00 pm, snfctech <tschm...@sacfoodcoop.com> wrote: > Does anyone have experience building a data warehouse in python? Any > thoughts on custom vs using an out-of-the-box product like Talend or > Informatica? > > I have an integrated system Dashboard project that I was going to > build using cross-vendor joins on existing DBs, but I keep hearing > that a data warehouse is the way to go. e.g. I want to create orders > and order_items with relations to members (MS Access DB), products > (flat file) and employees (MySQL). > > Thanks in advance for any tips.
I use both Python and a Data-warehouse tool (Datastage) from IBM that is similar to Informatica. The main difference with Python is throughput. The tool has good sort and join routines of multithreaded C code that handles data bigger than what fits in RAM. It also has good native drivers for the DB2 database. For data conversions and other transformations every row gets processed on a different CPU. You can really put a 16 core machine to good use with this thing. In your case you probably won't have enough data to justify the cost of buying a tool. They are quite expensive. -- http://mail.python.org/mailman/listinfo/python-list