The problem is that the HashAgg will have to maintain a counter for
every distinct value of trid, not just those that occur more than
once.  So if there are a huge number of one-time-only values you could
still blow out memory (and HashAgg doesn't currently know how to spill
to disk).
One-time-only values are in my case more probable, so it will use a lot of counters.

That "rows=200" estimate looks suspiciously like a default.  Has this
table been ANALYZEd recently?  I'd expect the planner not to choose
HashAgg if it has a more realistic estimate of the number of groups.

                        regards, tom lane
I did a vacuum analyze, now the explain gives different results.

pointspp=# vacuum analyze;
VACUUM

pointspp=# explain select trid, count(*) from pptran group by trid having count(*) > 1;
                                  QUERY PLAN
--------------------------------------------------------------------------------
GroupAggregate  (cost=9842885.29..10840821.57 rows=36288592 width=18)
  Filter: (count(*) > 1)
  ->  Sort  (cost=9842885.29..9933606.77 rows=36288592 width=18)
        Sort Key: trid
-> Seq Scan on pptran (cost=0.00..1039725.92 rows=36288592 width=18)
(5 rows)

pointspp=# select trid, count(*) from pptran group by trid having count(*) > 1; ERROR: could not write block 661572 of temporary file: No space left on device
HINT:  Perhaps out of disk space?

I have 5.1GB of free disk space. If this is the cause, I have a problem... or is there another way to extract (and remove) duplicate rows?


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to