Matthias,

I think that MySQL is doing what would be expected, namely an index scan
which reads entire index on IP and selects distinct values of IP.  Whatever
you do, it's going to read 10,991,123 of something and select distinct
values.  The use of the index saves a possible sort and allows reading a
smaller record than the full table.

A clever programmer, knowing that there are a handfull of values for IP,
could probably read the index structure and figure out how to avoid reading
all blocks of the index.  However, this is a bit of a special case, and the
MySQL optimizer isn't that smart.  You probably aren't going to do much by
tinkering with the indexes.

If you need to do this query frequently, you could add a table containing
just the IP values.  You would add to it (INSERT IGNORE) when adding to the
larger table.  You would either
(1)  Remember to check for a delete on the smaller table when deleting from
the larger table, or
(2)  Use a LEFT JOIN in your query to select those values in the smaller
table which exist in the larger table; you would then rebuild the smaller
table from time to time when it has too many obsolete entries. (The left
join should be fast, as it only has to look for one of each possible IP in
the larger table.)

HTH, Bill

Matthias Urlichs wrote:

>We have a slight opimization problem here.
>
>Given this table:
>
>CREATE TABLE `test` (
>  `IP` varchar(15) collate latin1_german1_ci NOT NULL default '',
>  `Type` enum('WARN','ERROR','FATAL','UNKNOWN') collate latin1_german1_ci
NOT NULL default 'WARN',
>  `epoch` int(10) unsigned NOT NULL default '0',
>  KEY `Trap` (`IP`,`Type`,`epoch`),
>  KEY `IP` (`IP`)
>
>)
>
>... containing ten million records; the "IP" column holds only a handful
>of distinct values. Given this, I would expect a "select distinct ip"
>to return immediately.
>
>However,
>
>
>
>>explain select distinct ip from test;
>>
>>
>+----+-------------+-------+-------+---------------+------+---------+------
+----------+-------------+
>| id | select_type | table | type  | possible_keys | key  | key_len | ref
| rows     | Extra       |
>+----+-------------+-------+-------+---------------+------+---------+------
+----------+-------------+
>|  1 | SIMPLE      | test  | index | NULL          | IP   |      15 | NULL
| 10991123 | Using index |
>+----+-------------+-------+-------+---------------+------+---------+------
+----------+-------------+
>
>takes a *long* time and obviously scans the whole table.
>
>Ideas, anybody?
>
>MyISAM vs. InnoDB behave identically.
>4.0 or 4.1.5 also didn't make a difference; I didn't test 5.0, as this
>is supposed to be a production system.


-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:    http://lists.mysql.com/[EMAIL PROTECTED]

Reply via email to