Re: [HACKERS] problem with large maintenance_work_mem settings and

Tom Lane Fri, 10 Mar 2006 12:10:36 -0800

Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
> samples  %        symbol name
> 350318533 98.8618  mergepreread
> 971822    0.2743  tuplesort_gettuple_common
> 413674    0.1167  tuplesort_heap_siftup


I don't have enough memory to really reproduce this, but I've come close
enough that I believe I see what's happening.  It's an artifact of the
code I added recently to prevent the SortTuple array from growing during
the merge phase, specifically the "mergeslotsfree" logic.  You can get
into a state where mergeslotsfree is at the lower limit of what the code
will allow, and then if there's a run of tuples that should come from a
single tape, mergepreread ends up sucking just one tuple per call from
that tape --- and with the outer loop over 28000 tapes that aren't doing
anything, each call is pretty expensive.  I had mistakenly assumed that
the mergeslotsfree limit would be a seldom-hit corner case, but it seems
it's not so hard to get into that mode after all.  The code really needs
to do a better job of sharing the available array slots among the tapes.
Probably the right answer is to allocate so many free array slots to each
tape, similar to the per-tape limit on memory usage --- I had thought
that the memory limit would cover matters but it doesn't.

Another thing I am wondering about is the code's habit of prereading
from all tapes when one goes empty.  This is clearly pretty pointless in
the final-merge-pass case: we might as well just reload from the one
that went empty, and not bother scanning the rest.  However, in the
scenario where we are rewriting the data to tape, I think we still need
the preread-from-all behavior in order to keep things efficient in
logtape.c.  logtape likes it if you alternate a lot of reads with a lot
of writes, so once you've started reading you really want to refill
memory completely.

It might also be worth remembering the index of the last active tape so
that we don't iterate over thousands of uninteresting tapes in
mergepreread.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [HACKERS] problem with large maintenance_work_mem settings and

Reply via email to