With Postgresql 8.1.9 -- I have a simple group by query:
SELECT target_page_id, min(created_at) FROM page_page_link GROUP BY 1; The page_page_link table has ~130 million rows. After analyzing the table, the planner picks a hash aggregate plan, which results in an out of memory error. crystal=> analyze page_page_link; ANALYZE crystal=> explain crystal-> SELECT target_page_id as page_id, min(created_at) as created_at crystal-> FROM page_page_link crystal-> GROUP By 1 crystal-> ; QUERY PLAN ----------------------------------------------------------------------------------- HashAggregate (cost=3663517.88..3670393.09 rows=550017 width=12) -> Seq Scan on page_page_link (cost=0.00..2993649.92 rows=133973592 width=12) (2 rows) The default_statistics_target was originally 200. I upped it to 1000 and still get the same results. crystal=> show default_statistics_target; default_statistics_target --------------------------- 1000 (1 row) crystal=> set enable_hashagg = off; SET crystal=> explain crystal-> SELECT target_page_id as page_id, min(created_at) as created_at crystal-> FROM page_page_link crystal-> GROUP BY 1 crystal-> ; QUERY PLAN ----------------------------------------------------------------------------------------- GroupAggregate (cost=27240841.37..28252518.53 rows=550017 width=12) -> Sort (cost=27240841.37..27575775.35 rows=133973592 width=12) Sort Key: target_page_id -> Seq Scan on page_page_link (cost=0.00..2993649.92rows=133973592 width=12) (4 rows) crystal=> I am working around this by setting enable_hashagg = off -- but it just seems like a case where the planner is not picking the strategy? Is there another setting I can change to help make better decisions? thanks in advance, Mason