With Postgresql 8.1.9 -- I have a simple group by query:

SELECT target_page_id, min(created_at)
FROM page_page_link
GROUP BY 1;

The page_page_link table has ~130 million rows.

After analyzing the table, the planner picks a hash aggregate plan, which
results in an out of memory error.

crystal=> analyze page_page_link;
ANALYZE
crystal=> explain
crystal-> SELECT target_page_id as page_id, min(created_at) as created_at
crystal-> FROM page_page_link
crystal-> GROUP By 1
crystal-> ;
                                   QUERY
PLAN
-----------------------------------------------------------------------------------
HashAggregate  (cost=3663517.88..3670393.09 rows=550017 width=12)
  ->  Seq Scan on page_page_link  (cost=0.00..2993649.92 rows=133973592
width=12)
(2 rows)

The default_statistics_target was originally 200.
I upped it to 1000 and still get the same results.

crystal=> show default_statistics_target;
default_statistics_target
---------------------------
1000
(1 row)

crystal=> set enable_hashagg = off;
SET
crystal=> explain
crystal-> SELECT target_page_id as page_id, min(created_at) as created_at
crystal-> FROM page_page_link
crystal-> GROUP BY 1
crystal-> ;
                                      QUERY
PLAN
-----------------------------------------------------------------------------------------
GroupAggregate  (cost=27240841.37..28252518.53 rows=550017 width=12)
  ->  Sort  (cost=27240841.37..27575775.35 rows=133973592 width=12)
        Sort Key: target_page_id
        ->  Seq Scan on page_page_link
(cost=0.00..2993649.92rows=133973592 width=12)
(4 rows)

crystal=>

I am working around this by setting enable_hashagg = off  -- but it just
seems like a case where the planner is not picking the strategy?

Is there another setting I can change to help make better decisions?

thanks in advance,

Mason

Reply via email to