On 09/11/2017 01:03 AM, Peter Geoghegan wrote: > On Sat, Sep 9, 2017 at 8:28 AM, Greg Stark <st...@mit.edu> wrote: >> This may be a bit "how long is a piece of string" but how do those two >> compare with string sorting in an interesting encoding/locale -- say >> /usr/share/dict/polish in pl_PL for example. It's certainly true that >> people do sort text as well as numbers. > > I haven't looked at text, because I presume that it's very rare for > tuples within a table to be more or less ordered by a text > attribute. While the traditional big advantage of replacement > selection has always been its ability to double initial run length on > average, where text performance is quite interesting because > localized clustering still happens, that doesn't really seem relevant > here. The remaining use of replacement selection is expressly only > about the "single run, no merge" best case, and in particular about > avoiding regressions when upgrading from versions prior to 9.6 (9.6 > is the version where we began to generally prefer quicksort). > >> Also, people often sort on keys of more than one column.... > > Very true. I should test this. >
I'm currently re-running the benchmarks we did in 2016 for 9.6, but those are all sorts with a single column (see the attached script). But it'd be good to add a few queries testing sorts with multiple keys. We can either tweak some of the existing data sets + queries, or come up with entirely new tests. The current tests construct data sets with different features (unique or low/high-cardinality, random/sorted/almost-sorted). How should that work for multi-key sorts? Same features for all columns, or some mix? For the existing queries, I should have some initial results tomorrow, at least for the data sets with 100k and 1M rows. The tests with 10M rows will take much more time (it takes 1-2hours for a single work_mem value, and we're testing 6 of them). But perhaps we can reduce the number of tests somehow, only testing the largest/smallest work_mem values? So that we could get some numbers now, possibly running the complete test suite later. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
sort-bench.sh
Description: application/shellscript
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers