On Thu, Feb 4, 2016 at 1:46 AM, Peter Geoghegan <p...@heroku.com> wrote: > It wasn't my original insight that replacement selection has become > all but obsolete. It took me a while to come around to that point of > view.
Nyberg et al may have said it best in 1994, in the Alphasort Paper [1]: "By comparison, OpenVMS sort uses a pure replacement-selection sort to generate runs (Knuth, 1973). Replacement-selection is best for a memory-constrained environment. On average, replacement-selection generates runs that are twice as large as available memory, while the QuickSort runs are typically less than half of available memory. However, in a memory-rich environment, QuickSort is faster because it is simpler, makes fewer exchanges on average, and has superior address locality to exploit processor caching. " (I believe that the authors state that "QuickSort runs are typically less than half of available memory" because of the use of explicit asynchronous I/O in each thread, which doesn't apply to us). The paper also has very good analysis of the economics of sorting: "Even for surprisingly large sorts, it is economical to perform the sort in one pass." Of course, memory capacities have scaled enormously in the 20 years since this analysis was performed, so the analysis applies even at the very low end these days. The high capacity memory system that they advocate to get a one pass sort (instead of having faster disks) had 100MB of memory, which is of course tiny by contemporary standards. If you pay Heroku $7 a month, you get a "Hobby Tier" database with 512MB of memory. The smallest EC2 instance size, the t2.nano, costs about $1.10 to run for one week, and has 0.5GB of memory. The economics of using 4MB or even 20MB to sort 10GB of data are already preposterously bad for everyone that runs a database server, no matter how budget conscious they may be. I can reluctantly accept that we need to still use a heap with very low work_mem settings to avoid the risk of a regression (in the event of a strong correlation) on general principle, but I'm well justified in proposing "just don't do that" as the best practical advice. I thought I had your agreement on that point, Robert; is that actually the case? [1] http://www.cs.berkeley.edu/~rxin/db-papers/alphasort.pdf -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers