On Sun, Dec 28, 2014 at 12:45 PM, Peter Geoghegan <p...@heroku.com> wrote:
> On Sun, Dec 28, 2014 at 12:37 PM, Jeff Davis <pg...@j-davis.com> wrote: > > Do others have similar numbers? I'm quite surprised at how little > > work_mem seems to matter for these plans (HashJoin might be a different > > story though). I feel like I made a mistake -- can someone please do a > > sanity check on my numbers? > > I have seen external sorts that were quicker than internal sorts > before. With my abbreviated key patch, under certain circumstances > external sorts are faster, while presumably the same thing is true of > int4 attribute sorts today. Actually, I saw a 10MB work_mem setting > that was marginally faster than a multi-gigabyte one that fit the > entire sort in memory. It probably has something to do with caching > effects dominating over the expense of more comparisons, since higher > work_mem settings that still resulted in an external sort were slower > than the 10MB setting. > > I was surprised by this too, but it has been independently reported by > Jeff Janes. > I don't recall (at the moment) seeing our external sort actually faster than quick-sort, but I've very reliably seen external sorts get faster with less memory than with more. It is almost certainly a CPU caching issue. Very large simple binary heaps are horrible on the CPU cache. And for sort-by-reference values, quick sort is also pretty bad. With a slow enough data bus between the CPU and main memory, I don't doubt that a 'tapesort' with small work_mem could actually be faster than quicksort with large work_mem. But I don't recall seeing it myself. But I'd be surprised that a tapesort as currently implemented would be faster than a quicksort if the tapesort is using just one byte less memory than the quicksort is. But to Jeff Davis's question, yes, tapesort is not very sensitive to work_mem, and to the extent it is sensitive it is in the other direction of more memory being bad. Once work_mem is so small that it takes multiple passes over the data to do the merge, then small memory would really be a problem. But on modern hardware you have to get pretty absurd settings before that happens. Cheers, Jeff