Hi hackers, I'd submit an implementation of multi-key sort for review. Please see the code as attachment. Thanks for your reponse in advance.
Overview -------- MKsort (multi-key sort) is an alternative of standard qsort algorithm, which has better performance for particular sort scenarios, i.e. the data set has multiple keys to be sorted. The implementation is based on the paper: Jon L. Bentley and Robert Sedgewick, "Fast Algorithms for Sorting and Searching Strings", Jan 1997 [1] MKsort is applied only for tuple sort by the patch. Theoretically it can be applied for general-purpose sort scenario when there are multiple sort keys available, but it is relatively difficult in practice because kind of unique interface is needed to manipulate the keys. So I limit the usage of mksort to sort SortTuple. Comparing to classic quick sort, it can get significant performance improvement once multiple keys are available. A rough test shows it got ~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE INDEX on the same data set. (See more details in section "Performance Test") Author: Yao Wang <yaowa...@outlook.com> Co-author: Hongxu Ma <inte...@outlook.com> Scope ----- The interface of mksort is pretty simple: in tuplesort_sort_memtuples(), mksort_tuple() is invoked instead of qsort_tuple() if mksort is applicable. The major logic in mksort_tuple() is to apply mksort algorithm on SortTuple, and kind of callback mechanism is used to handle sort-variant-specific issue, e.g. comparing different datums, like qsort_tuple() does. It also handles the complexity of "abbreviated keys". A small difference from classic mksort algorithm is: for IndexTuple, when all the columns are equal, an additional comparing based on ItemPointer is performed to determine the order. It is to make the result consistent to existing qsort. I did consider about implementing mksort by the approach of kind of template mechanism like qsort (see sort_template.h), but it seems unnecessary because all concrete tuple types need to be handled are derived from SortTuple. Use callback to isolate type specific features is good enough. Note that not all tuple types are supported by mksort. Please see the comments inside tuplesort_sort_memtuples(). Test Cases ---------- The changes of test cases include: * Generally, mksort should generate result exactly same to qsort. However some test cases don't. The reason is that SQL doesn't specify order on all possible columns, e.g. "select c1, c2 from t1 order by c1" will generate different results between mksort/qsort when c1 values are equal, and the solution is to order c2 as well ("select c1, c2 from t1 order by c1, c2"). (e.g. geometry) * Some cases need to be updated to display the new sort method "multi-key sort" in explain result. (e.g. incremental_sort) * regress/tuplesort was updated with new cases to cover some scenarios of mksort. Performance Test ---------------- The script I used to configure the build: CFLAGS="-O3 -fargument-noalias-global -fno-omit-frame-pointer -g" ./configure --prefix=$PGHOME --with-pgport=5432 --with-perl --with-openssl --with-python --with-pam --with-blocksize=16 --with-wal-blocksize=16 --with-perl --enable-tap-tests --with-gssapi --with-ldap I used the script for a rough test for ORDER BY: \timing on create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100)); insert into t1 values (generate_series(1,499999), 0, 0, 0, 0, 'aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbb'); update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3; update t1 set c6 = 'aaaaaaaaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbb' || (c1 % 5)::text; -- Use a large work mem to ensure the entire sort happens in memory set work_mem='1GB'; -- switch between qsort/mksort set enable_mk_sort=off; explain analyze select c1 from t1 order by c6, c5, c4, c3, c2, c1; Results: mksort: 1341.283 ms (00:01.341) 1379.098 ms (00:01.379) 1369.868 ms (00:01.370) qsort: 3137.277 ms (00:03.137) 3147.771 ms (00:03.148) 3131.887 ms (00:03.132) The perf improvement is ~129%. Another perf test for CREATE INDEX: create index idx_t1_mk on t3 (c6, c5, c4, c3, c2, c1); Results: mksort: 1147.207 ms (00:01.147) 1200.501 ms (00:01.201) 1235.657 ms (00:01.236) Qsort: 1852.957 ms (00:01.853) 1824.209 ms (00:01.824) 1808.781 ms (00:01.809) The perf improvement is ~52%. Another test is to use one of queries of TPC-H: set work_mem='1GB'; -- query rewritten from TPCH-Q1, and there are 6001215 rows in lineitem explain analyze select l_returnflag,l_linestatus,l_quantity,l_shipmode from lineitem where l_shipdate <= date'1998-12-01' - interval '65 days' order by l_returnflag,l_linestatus,l_quantity,l_shipmode; Result: Qsort: 14582.626 ms 14524.188 ms 14524.111 ms mksort: 11390.891 ms 11647.065 ms 11546.791 ms The perf improvement is ~25.8%. [1] https://www.cs.tufts.edu/~nr/cs257/archive/bob-sedgewick/fast-strings.pdf [2] https://www.tpc.org/tpch/ Thanks, Yao Wang
0001-Implement-multi-key-sort.patch
Description: 0001-Implement-multi-key-sort.patch