I've built several datamarts using perl and MySQL. The largest ones have been up to about 30GB, so I'm not quite on your scale.
for #1, I have an etl_id in the fact table so I can track back any particular ETL job. I typically make it a dimension and include date, time, software version, etc. That doesn't help so much if you're messing up your dimension tables, but I haven't typically run into that problem based on the designs I've used. For #2, I haven't built anything big enough for it to be a concern yet.. Also, LOAD DATA INFILE is your friend :) On Thu, Apr 3, 2008 at 11:28 AM, Dre <[EMAIL PROTECTED]> wrote: > Hey folks, > > I'm currently deciding whether to build a decent sized (around 300-500GB, > although honestly, I've got little to base that on at the moment) data > warehouse in postgreSQL or MySQL. I've developed several in MS SQL and > postgreSQL, but the client is comfortable with MySQL, and I'd prefer to use > that as the platform since it will be less painful for them to manage when > I'm gone. I'm hoping that someone with experience building a warehouse on > MySQL will be able to answer two outstanding questions I have: > > 1) Several sources seem to suggest MyISAM is a good choice for data > warehousing, but due to my lack of experience in a transaction-less world, > this makes me a little nervous. How do you handle data inconsistency > problems when ETL jobs fail? (For the record, I don't use a separate tool > for the ETL; I usually use perl/shell scripts to interact with the file > system, and pl/pgsql or transact-sql once the data is loaded into the > staging database. For each file that is loaded, I'll identify steps that > must be posted together, and wrap them in a transaction in the ETL job.) I > can see doing something like manually cleaning out the necessary tables > before you re-run, but that seems a bit messy to me. Anyone figure out a > better approach? > > 2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs in > the fact tables will be low cardinality columns; queries that didn't use > date would be very slow on large fact tables (MS SQL had this problem). Has > anyone run into this with MySQL? > > Many thanks in advance! > > -- > MySQL General Mailing List > For list archives: http://lists.mysql.com/mysql > To unsubscribe: > http://lists.mysql.com/[EMAIL PROTECTED] > > -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]