Joseph Shraibman wrote:


Richard Huxton wrote:
Joseph Shraibman wrote:
I ran a query that was "SELECT field, count(*) INTO TEMP temptable" and it grew to be 10gig (as reported by top)

What was the real query?

First I selected 90634 rows (3 ints) into the first temp table, then I did "select intfield1, count(intfield2) FROM realtable rt WHERE rt.id = temptable.id and other conditions on rt here GROUP BY intfield1". The size of the second temp table should have been no more than 60000 rows.

<SNIP>

If your memory settings in postgresql.conf are reasonable (and they look fine), this shouldn't happen. Let's see if an EXPLAIN sheds any light.

=> explain SELECT ml.uid, count(ml.jid) AS cnt INTO TEMP tempml FROM ml WHERE ml.jid = tempjr1180108653561.id AND ml.status IN(2,5,20) GROUP BY ml.uid;

NOTICE:  adding missing FROM-clause entry for table "tempjr1180108653561"

I'm guessing this is just a typo from your test and you'd normally mention the temp-table.

LINE 2: ...INTO TEMP tempml FROM ml WHERE ml.jid = tempjr1180...
                                                             ^
                                        QUERY PLAN
------------------------------------------------------------------------------------------
 HashAggregate  (cost=11960837.72..11967601.06 rows=541067 width=8)
   ->  Hash Join  (cost=9675074.94..11849780.55 rows=22211434 width=8)

Here you seem to have 22 million rows estimated for your join.

         Hash Cond: (tempjr1180108653561.id = ml.jid)
-> Seq Scan on tempjr1180108653561 (cost=0.00..31.40 rows=2140 width=4)

Is the 2140 rows here a good estimate?

         ->  Hash  (cost=6511767.18..6511767.18 rows=181979021 width=8)
-> Seq Scan on ml (cost=0.00..6511767.18 rows=181979021 width=8)

OK, so the 22 million matches is because "ml" has 181 million rows. Is that right too?

                     Filter: (status = ANY ('{2,5,20}'::integer[]))

Overall it's estimating about 9 times the number of rows you were expecting (541000 vs 60000). Not enough to account for your extreme memory usage.

Let's see if that hash-join is really the culprit. Can you run EXPLAIN and then EXPLAIN ANALYSE on the query, but first issue:

SET enable_hashjoin=off;

If that make little difference, try the same with enable_hashagg.

--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

              http://www.postgresql.org/docs/faq

Reply via email to