so i'm debating between mysql and postgresql so far for choosing our data warehouse platform, and i've been using mysql for about 8 years now and would hate to move into postgresql if possible. mysql definitely has it's pros and cons, but so far, more pros than cons except for big joins. mysql outperforms postgresql in simple counts, sums, etc... but is a lot slower in big joins as far as i can tell so far. so as long as i can optimize these queries to work in mysql, i'll be saved and can stick with my favorite database. i'm only playing with about 5 days of data so far, but once i hit 30 days, it'll be huge.
so i created a star schema in 5.0.27 so far, and have only about 5 days of data in here, equalling about 63 million rows of fact data. so this will become huge over 30 days or a year, which is how long they want to keep raw data for. here is the query i'm battling with, and the reason why i have to do a count distinct on uu_id, is because everything is based on unique users, not visits. if i could use visits, i could easily use that as an aggregate in the fact table, but can't. so let me know if anyone has any experience with building data warehouses in mysql and if you have any recommendations on what i might be doing wrong etc... or what i can do to optimize. > select A.gender as gender, B.area as area, sum(C.imps) as imps, sum(C.clicks) > as clicks, count(distinct(C.uu_id)) as users > from uus as A, areas as B, daily_area_fact as C > where A.uu_id = C.uu_id > and B.area_id = C.area_id > group by gender,area; so this takes about 1.3 hours to run. postgresql takes about 50 minutes, using the same hardware/data/memory/working memory/cpu etc... fact table: mysql> desc daily_area_fact; +-----------+---------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-----------+---------------+------+-----+---------+-------+ | time_id | int(11) | NO | PRI | 0 | | | ccid | int(11) | NO | PRI | 0 | | | area_id | int(11) | NO | PRI | 0 | | | uu_id | int(11) | NO | PRI | 0 | | | geo_id | int(11) | NO | PRI | 0 | | | imps | int(11) | YES | | NULL | | | clicks | int(11) | YES | | NULL | | | imp_rev | decimal(10,6) | YES | | NULL | | | click_rev | decimal(10,6) | YES | | NULL | | | total_rev | decimal(10,6) | YES | | NULL | | +-----------+---------------+------+-----+---------+-------+ 10 rows in set (0.03 sec) area dimension table: mysql> desc areas; +---------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +---------+--------------+------+-----+---------+----------------+ | area_id | int(11) | NO | MUL | NULL | auto_increment | | site | varchar(100) | NO | PRI | | | | area | varchar(100) | NO | PRI | | | | subarea | varchar(100) | NO | PRI | | | | size | varchar(50) | NO | PRI | | | | pos | varchar(50) | NO | PRI | | | +---------+--------------+------+-----+---------+----------------+ 6 rows in set (0.00 sec) uus (user dimension) mysql> desc uus; +--------------+---------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +--------------+---------------+------+-----+---------+----------------+ | uu_id | int(11) | NO | MUL | NULL | auto_increment | | uu | char(35) | NO | PRI | | | | imps | int(11) | YES | | NULL | | | clicks | int(11) | YES | | NULL | | | gca_clicks | int(11) | YES | | NULL | | | convs | int(11) | YES | | NULL | | | imp_rev | decimal(10,6) | YES | | NULL | | | click_rev | decimal(10,6) | YES | | NULL | | | total_rev | decimal(10,6) | YES | | NULL | | | geo_id | int(11) | YES | | NULL | | | reg | int(11) | YES | | NULL | | | usernum | int(11) | YES | MUL | NULL | | | gender | char(1) | YES | | NULL | | | age | int(11) | YES | | NULL | | | age_grp | char(10) | YES | | NULL | | | reg_date | date | YES | | NULL | | | vis | int(11) | YES | | NULL | | | first_date | date | YES | | NULL | | | first_hour | int(11) | YES | | NULL | | | last_date | date | YES | | NULL | | | utm_campaign | varchar(100) | YES | | NULL | | | utm_medium | varchar(100) | YES | | NULL | | | utm_source | varchar(100) | YES | | NULL | | | utm_content | varchar(255) | YES | | NULL | | | utm_keyword | varchar(255) | YES | | NULL | | +--------------+---------------+------+-----+---------+----------------+ 25 rows in set (0.00 sec) ____________________________________ Mark Jensen ____________________________________________________________________________________ Yahoo! Music Unlimited Access over 1 million songs. http://music.yahoo.com/unlimited -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]