so i'm debating between mysql and postgresql so far for choosing our data 
warehouse platform, and i've been using mysql for about 8 years now and would 
hate to move into postgresql if possible.  mysql definitely has it's pros and 
cons, but so far, more pros than cons except for big joins.  mysql outperforms 
postgresql in simple counts, sums, etc... but is a lot slower in big joins as 
far as i can tell so far.  so as long as i can optimize these queries to work 
in mysql, i'll be saved and can stick with my favorite database.  i'm only 
playing with about 5 days of data so far, but once i hit 30 days, it'll be huge.

so i created a star schema in 5.0.27 so far, and have only about 5 days of data 
in here, equalling about 63 million rows of fact data.  so this will become 
huge over 30 days or a year, which is how long they want to keep raw data for.  
here is the query i'm battling with, and the reason why i have to do a count 
distinct on uu_id, is because everything is based on unique users, not visits.  
if i could use visits, i could easily use that as an aggregate in the fact 
table, but can't.

so let me know if anyone has any experience with building data warehouses in 
mysql and if you have any recommendations on what i might be doing wrong etc... 
or what i can do to optimize.

> select A.gender as gender, B.area as area, sum(C.imps) as imps, sum(C.clicks) 
> as clicks, count(distinct(C.uu_id)) as users
> from uus as A, areas as B, daily_area_fact as C
> where A.uu_id = C.uu_id
> and B.area_id = C.area_id
> group by gender,area;

so this takes about 1.3 hours to run.  postgresql takes about 50 minutes, using 
the same hardware/data/memory/working memory/cpu etc...

fact table:

mysql> desc daily_area_fact;
+-----------+---------------+------+-----+---------+-------+
| Field     | Type          | Null | Key | Default | Extra |
+-----------+---------------+------+-----+---------+-------+
| time_id   | int(11)       | NO   | PRI | 0       |       |
| ccid      | int(11)       | NO   | PRI | 0       |       |
| area_id   | int(11)       | NO   | PRI | 0       |       |
| uu_id     | int(11)       | NO   | PRI | 0       |       |
| geo_id    | int(11)       | NO   | PRI | 0       |       |
| imps      | int(11)       | YES  |     | NULL    |       |
| clicks    | int(11)       | YES  |     | NULL    |       |
| imp_rev   | decimal(10,6) | YES  |     | NULL    |       |
| click_rev | decimal(10,6) | YES  |     | NULL    |       |
| total_rev | decimal(10,6) | YES  |     | NULL    |       |
+-----------+---------------+------+-----+---------+-------+
10 rows in set (0.03 sec)

area dimension table:

mysql> desc areas;
+---------+--------------+------+-----+---------+----------------+
| Field   | Type         | Null | Key | Default | Extra          |
+---------+--------------+------+-----+---------+----------------+
| area_id | int(11)      | NO   | MUL | NULL    | auto_increment |
| site    | varchar(100) | NO   | PRI |         |                |
| area    | varchar(100) | NO   | PRI |         |                |
| subarea | varchar(100) | NO   | PRI |         |                |
| size    | varchar(50)  | NO   | PRI |         |                |
| pos     | varchar(50)  | NO   | PRI |         |                |
+---------+--------------+------+-----+---------+----------------+
6 rows in set (0.00 sec)

uus (user dimension)

mysql> desc uus;
+--------------+---------------+------+-----+---------+----------------+
| Field        | Type          | Null | Key | Default | Extra          |
+--------------+---------------+------+-----+---------+----------------+
| uu_id        | int(11)       | NO   | MUL | NULL    | auto_increment |
| uu           | char(35)      | NO   | PRI |         |                |
| imps         | int(11)       | YES  |     | NULL    |                |
| clicks       | int(11)       | YES  |     | NULL    |                |
| gca_clicks   | int(11)       | YES  |     | NULL    |                |
| convs        | int(11)       | YES  |     | NULL    |                |
| imp_rev      | decimal(10,6) | YES  |     | NULL    |                |
| click_rev    | decimal(10,6) | YES  |     | NULL    |                |
| total_rev    | decimal(10,6) | YES  |     | NULL    |                |
| geo_id       | int(11)       | YES  |     | NULL    |                |
| reg          | int(11)       | YES  |     | NULL    |                |
| usernum      | int(11)       | YES  | MUL | NULL    |                |
| gender       | char(1)       | YES  |     | NULL    |                |
| age          | int(11)       | YES  |     | NULL    |                |
| age_grp      | char(10)      | YES  |     | NULL    |                |
| reg_date     | date          | YES  |     | NULL    |                |
| vis          | int(11)       | YES  |     | NULL    |                |
| first_date   | date          | YES  |     | NULL    |                |
| first_hour   | int(11)       | YES  |     | NULL    |                |
| last_date    | date          | YES  |     | NULL    |                |
| utm_campaign | varchar(100)  | YES  |     | NULL    |                |
| utm_medium   | varchar(100)  | YES  |     | NULL    |                |
| utm_source   | varchar(100)  | YES  |     | NULL    |                |
| utm_content  | varchar(255)  | YES  |     | NULL    |                |
| utm_keyword  | varchar(255)  | YES  |     | NULL    |                |
+--------------+---------------+------+-----+---------+----------------+
25 rows in set (0.00 sec)
 
____________________________________
Mark Jensen




 
____________________________________________________________________________________
Yahoo! Music Unlimited
Access over 1 million songs.
http://music.yahoo.com/unlimited

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to