Conceptually, what I'd like to do is join two tables on a hostid for a
result like this non-join version where hostid in the 'IN' come from the
other table.
mysql> explain select avg(load5min) from kstatHostData_20050513 where
hostid IN (250, 400) and localdate > '2005-06-06 13:00:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: kstatHostData_20050513
type: range
possible_keys: hostid_gmtdate_idx,hostid_localdate_idx
key: hostid_localdate_idx
key_len: 11
ref: NULL
rows: 385
Extra: Using where
1 row in set (0.01 sec)
This works great. It correctly picks the hostid_localdate_idx index and
the query is fast.
If I convert this to JOIN a table which contains exactly the same values
( two rows: 250, 400) as in the IN clause above, mysql picks either key
and only using hostid portion to complete the join, e.g.,
mysql> explain select avg(load5min) from kstatHostData_20050513 ks,
hostinfo hi WHERE ks.hostid = hi.hostid and localdate > '2005-06-06
13:00:00'\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: hi
type: index
possible_keys: bar_idx
key: bar_idx
key_len: 4
ref: NULL
rows: 2
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: ks
type: ref
possible_keys: hostid_gmtdate_idx,hostid_localdate_idx
key: hostid_localdate_idx
key_len: 3
ref: postinistats.hi.hostid
rows: 32338
Extra: Using where
2 rows in set (0.00 sec)
mysql>
This query, needless to say, is abysmally slow. The actual number of
applicable rows needed, as indicated in the first example where the full
index is used, is approximately 400. It makes sense that either index
will do as they both start with the hostid and you can only use one
index, I'm just not sure how to work around it when the WHERE date
condition signifincantly limits the result. I tried a subquery which
yield results that actually appear worst than the join. (See below).
Short of populating my IN block in the application with a pre-query, or
re-organizing data into summaries to limit rows, I'm out of ideas on how
to improve this. Maybe I'm missing some obvious solution. If anyone
has any thoughts or explanations, I'd really appeciate them. Mysql is
4.1.12-standard.
Thanks,
John
mysql> explain select avg(load5min) from kstatHostData_20050513 WHERE
hostid IN ( SELECT hostid FROM hostinfo) AND localdate > '2005-06-06
13:00:00'\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: kstatHostData_20050513
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 26797461
Extra: Using where
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: hostinfo
type: index_subquery
possible_keys: bar_idx
key: bar_idx
key_len: 4
ref: func
rows: 2
Extra: Using index
2 rows in set (0.00 sec)
mysql>
--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]