On 09/11/2017 02:22 AM, Peter Geoghegan wrote: > On Sun, Sep 10, 2017 at 5:07 PM, Tomas Vondra > <tomas.von...@2ndquadrant.com> wrote: >> I'm currently re-running the benchmarks we did in 2016 for 9.6, but >> those are all sorts with a single column (see the attached script). But >> it'd be good to add a few queries testing sorts with multiple keys. We >> can either tweak some of the existing data sets + queries, or come up >> with entirely new tests. > > I see that work_mem is set like this in the script: > > "for wm in '1MB' '8MB' '32MB' '128MB' '512MB' '1GB'; do" > > I suggest that we forget about values over 32MB, since the question of > how well quicksort does there was settled by your tests in 2016. I > would also add '4MB' to the list of wm values that you'll actually > test.
OK, so 1MB, 4MB, 8MB, 32MB? > > Any case with input that is initially in random order or DESC sort > order is not interesting, either. I suggest you remove those, too. > OK. > I think we're only interested in benchmarks where replacement > selection really does get its putative best case (no merge needed in > the end). Any (almost) sorted cases (the only cases that you are > interesting to test now) will always manage that, once you set > replacement_sort_tuples high enough, and provided there isn't even a > single tuple that is completely out of order. The "before" cases here > should have a replacement_sort_tuples of 1 billion (so that we're sure > to not have the limit prevent the use of replacement selection in the > first place), versus the "after" cases, which should have a > replacement_sort_tuples of 0 to represent my proposal (to represent > performance in a world where replacement selection is totally > removed). > Ah, so you suggest doing all the tests on current master, by only tweaking the replacement_sort_tuples value? I've been testing master vs. your patch, but I guess setting replacement_sort_tuples=0 should have the same effect. I probably won't eliminate the random/DESC data sets, though. At least not from the two smaller data sets - I want to do a bit of benchmarking on Heikki's polyphase merge removal patch, and for that patch those data sets are still relevant. Also, it's useful to have a subset of results where we know we don't expect any change. >> For the existing queries, I should have some initial results >> tomorrow, at least for the data sets with 100k and 1M rows. The >> tests with 10M rows will take much more time (it takes 1-2hours for >> a single work_mem value, and we're testing 6 of them). > > I myself don't see that much value in a 10M row test. > Meh, more data is probably better. And with the reduced work_mem values and skipping of random/DESC data sets it should complete much faster. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers