Re: [SQL] identifying duplicates in table with redundancies

Oliveiros d'Azevedo Cristina Wed, 29 Sep 2010 06:00:49 -0700


Hallo Andreas,
I reduced the problem to the innermost query:

1) SELECT DISTINCT trainer_id, trainer_name FROM student
This results in a sequential table scan. Execution time 7500ms.

2) I created an INDEX ON (trainer_id, trainer_name). Then I had an index
scan instead, which still cost 7000ms.

3) Next, I changed from DISTINCT to GROUP BY:
SELECT trainer_id, trainer_name FROM student
GROUP BY trainer_id, trainer_name
This resulted in an index scan @ 6750ms

4) I filtered out NULL trainer_ids
WHERE trainer_id IS NOT NULL
Amazingly, this resulted in a sequential table scan, which only took
1300ms!!

Please, explain (pun not intended)! How can this be. Only 11000/250000
rows have a null trainer_id.


That's an impressive improvement...

Personally I have no idea what caused it, specially when you say it wassequential :-|

Warmed caches ?

Best,

Oliver


--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql

Re: [SQL] identifying duplicates in table with redundancies

Reply via email to