Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes: > samples % symbol name > 350318533 98.8618 mergepreread > 971822 0.2743 tuplesort_gettuple_common > 413674 0.1167 tuplesort_heap_siftup
I don't have enough memory to really reproduce this, but I've come close enough that I believe I see what's happening. It's an artifact of the code I added recently to prevent the SortTuple array from growing during the merge phase, specifically the "mergeslotsfree" logic. You can get into a state where mergeslotsfree is at the lower limit of what the code will allow, and then if there's a run of tuples that should come from a single tape, mergepreread ends up sucking just one tuple per call from that tape --- and with the outer loop over 28000 tapes that aren't doing anything, each call is pretty expensive. I had mistakenly assumed that the mergeslotsfree limit would be a seldom-hit corner case, but it seems it's not so hard to get into that mode after all. The code really needs to do a better job of sharing the available array slots among the tapes. Probably the right answer is to allocate so many free array slots to each tape, similar to the per-tape limit on memory usage --- I had thought that the memory limit would cover matters but it doesn't. Another thing I am wondering about is the code's habit of prereading from all tapes when one goes empty. This is clearly pretty pointless in the final-merge-pass case: we might as well just reload from the one that went empty, and not bother scanning the rest. However, in the scenario where we are rewriting the data to tape, I think we still need the preread-from-all behavior in order to keep things efficient in logtape.c. logtape likes it if you alternate a lot of reads with a lot of writes, so once you've started reading you really want to refill memory completely. It might also be worth remembering the index of the last active tape so that we don't iterate over thousands of uninteresting tapes in mergepreread. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match