The planner is smarter with GROUP BY than with DISTINCT, so you can rewrite your query as the following, whihc will probaly use a HashAggregate, and be a lot faster :


        SELECT service_id FROM five_min_stats_200408 GROUP BY service_id;

        This won't avoid the Seq Scan however.

If you know you have few different service_ids, you can code a plpgsql function which does this :

SELECT INTO value service_id FROM five_min_stats_200408 ORDER BY service_id ASC LIMIT 1;
WHILE FOUND
RETURN NEXT value
SELECT INTO value service_id FROM five_min_stats_200408 WHERE service_is>value ORDER BY service_id ASC LIMIT 1;
END WHILE


Basically it skips from one value to the next using your index, and returns them as they come. You'll get one indexed SELECT by distinct value. If you have, say 100 distinct values in 1M rows it'll be many orders of magniude faster.



estat=> explain analyze select distinct(service_id) from
five_min_stats_200408;
QUERY PLAN
------------------------------------------------------------------------
------------------------------------------------------------------------
----
 Unique  (cost=13578354.70..13894902.76 rows=726 width=12) (actual
time=1227906.271..1282110.055 rows=879 loops=1)
   ->  Sort  (cost=13578354.70..13736628.73 rows=63309612 width=12)
(actual time=1227906.266..1255961.318 rows=63359396 loops=1)
         Sort Key: service_id
         ->  Seq Scan on five_min_stats_200408  (cost=0.00..1668170.12
rows=63309612 width=12) (actual time=0.061..80398.222 rows=63359396
loops=1)
 Total runtime: 1284212.556 ms
(5 rows)

Time: 1284213.359 ms




---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend

Reply via email to