I've built several datamarts using perl and MySQL. The largest ones
have been up to about 30GB, so I'm not quite on your scale.

for #1, I have an etl_id in the fact table so I can track back any
particular ETL job. I typically make it a dimension and include date,
time, software version, etc. That doesn't help so much if you're
messing up your dimension tables, but I haven't typically run into
that problem based on the designs I've used.

For #2, I haven't built anything big enough for it to be a concern yet..

Also, LOAD DATA INFILE is your friend :)

On Thu, Apr 3, 2008 at 11:28 AM, Dre <[EMAIL PROTECTED]> wrote:
> Hey folks,
>
>  I'm currently deciding whether to build a decent sized (around 300-500GB,
> although honestly, I've got little to base that on at the moment) data
> warehouse in postgreSQL or MySQL.  I've developed several in MS SQL and
> postgreSQL, but the client is comfortable with MySQL, and I'd prefer to use
> that as the platform since it will be less painful for them to manage when
> I'm gone.  I'm hoping that someone with experience building a warehouse on
> MySQL will be able to answer two outstanding questions I have:
>
>  1) Several sources seem to suggest MyISAM is a good choice for data
> warehousing, but due to my lack of experience in a transaction-less world,
> this makes me a little nervous.  How do you handle data inconsistency
> problems when ETL jobs fail?  (For the record, I don't use a separate tool
> for the ETL; I usually use perl/shell scripts to interact with the file
> system, and pl/pgsql or transact-sql once the data is loaded into the
> staging database.  For each file that is loaded, I'll identify steps that
> must be posted together, and wrap them in a transaction in the ETL job.)  I
> can see doing something like manually cleaning out the necessary tables
> before you re-run, but that seems a bit messy to me.  Anyone figure out a
> better approach?
>
>  2) Isn't the lack of bitmap indexes a problem in the warehouse? Most FKs in
> the fact tables will be low cardinality columns; queries that didn't use
> date would be very slow on large fact tables (MS SQL had this problem).  Has
> anyone run into this with MySQL?
>
>  Many thanks in advance!
>
>  --
>  MySQL General Mailing List
>  For list archives: http://lists.mysql.com/mysql
>  To unsubscribe:
> http://lists.mysql.com/[EMAIL PROTECTED]
>
>

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to