Hi, I've run a few tests to get some feel for the effects of various comparators on Datums containing int32. I've attached the full results, as well as the (messy) patch which applies on top of 0012 to run the tests. I'll excerpt some of those results as I go through them here. For now, I only ran input orders of sorted, random, and reversed.
1) Specializing This is a win in all cases, including SQL-callable comparators (the case here is for _bt_sort_array_elements). NOTICE: [traditional qsort] size=8MB, order=random, cmp=arg, test=2, time=0.140526 NOTICE: [inlined] size=8MB, order=random, cmp=inline, test=0, time=0.085023 NOTICE: [SQL arg] size=8MB, order=random, cmp=SQL-arg, test=2, time=0.256708 NOTICE: [SQL inlined] size=8MB, order=random, cmp=SQL-inline, test=0, time=0.192063 2) Branchless operations The int case is for how to perform the comparison, and the SQL case is referring to how to reverse the sort order.Surprisingly, they don't seem to help for direct comparisons, and in fact they seem worse. I'll have to dig a bit deeper to be sure, but it's not looking good now. NOTICE: [inlined] size=8MB, order=random, cmp=inline, test=2, time=0.084781 NOTICE: [branchless] size=8MB, order=random, cmp=branchless, test=0, time=0.091837 NOTICE: [SQL inlined] size=8MB, order=random, cmp=SQL-inline, test=2, time=0.192018 NOTICE: [SQL inlined reverse] size=8MB, order=random, cmp=SQL-inline-rev, test=0, time=0.190797 When the effect is reversing a list, the direct comparisons seem much worse, and the SQL ones aren't helped. NOTICE: [inlined] size=8MB, order=decreasing, cmp=inline, test=2, time=0.024963 NOTICE: [branchless] size=8MB, order=decreasing, cmp=branchless, test=0, time=0.036423 NOTICE: [SQL inlined] size=8MB, order=decreasing, cmp=SQL-inline, test=0, time=0.125182 NOTICE: [SQL inlined reverse] size=8MB, order=increasing, cmp=SQL-inline-rev, test=0, time=0.127051 -- Since I have a couple more planned tests, I'll keep a running tally on the current state of the patch set so that summaries are not scattered over many emails: 0001 - bsearch and unique is good to have, and we can keep the return type pending further tests 0002/3 - I've yet to see a case where branchless comparators win, but other than that, these are good. Notational improvement and not performance sensitive. 0004/5 - Computing the arguments slows it down, but accessing the underlying int16s gives an improvement. [1] Haven't done an in-situ test on VACUUM. Could be worth it for pg15, since I imagine the proposals for dead tuple storage won't be ready this cycle. 0006 - I expect this to be slower too. I also wonder if this could also use the global function in 0004 once it's improved. 0007 - untested 0008 - Good performance in microbenchmarks, no in-situ testing. Inlined reversal is not worth the binary space or notational overhead. 0009 - Based on 0004, I would guess that computing the arguments is too slow. Not sure how to test in-situ to see if specializing helps. 0010 - Thresholds on my TODO list. 0011 - A simple correction -- I'll go ahead and commit this. v3-0001 comparators for abbreviated keys - Clearly a win, especially for the "unsigned" case [2]. There are still possible improvements, but they seem like a pg16 project(s). [1] https://www.postgresql.org/message-id/CA%2BhUKG%2BS5SMoG8Z2PHj0bsK70CxVLgqQR1orQJq6Cjgibu26vA%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAFBsxsEFGAJ9eBpQVb5a86BE93WER3497zn2OT5wbjm1HHcqgA%40mail.gmail.com (I just realized in that message I didn't attach the script for that, and also attached an extra draft spreadsheet. I'll improve the tests and rerun later) -- John Naylor EDB: http://www.enterprisedb.com
NOTICE: [traditional qsort] size=8MB, order=random, cmp=arg, test=0, time=0.144545 NOTICE: [traditional qsort] size=8MB, order=random, cmp=arg, test=1, time=0.140534 NOTICE: [traditional qsort] size=8MB, order=random, cmp=arg, test=2, time=0.140526 NOTICE: [inlined] size=8MB, order=random, cmp=inline, test=0, time=0.085023 NOTICE: [inlined] size=8MB, order=random, cmp=inline, test=1, time=0.086370 NOTICE: [inlined] size=8MB, order=random, cmp=inline, test=2, time=0.084781 NOTICE: [branchless] size=8MB, order=random, cmp=branchless, test=0, time=0.091837 NOTICE: [branchless] size=8MB, order=random, cmp=branchless, test=1, time=0.090426 NOTICE: [branchless] size=8MB, order=random, cmp=branchless, test=2, time=0.091245 NOTICE: [SQL arg] size=8MB, order=random, cmp=SQL-arg, test=0, time=0.257024 NOTICE: [SQL arg] size=8MB, order=random, cmp=SQL-arg, test=1, time=0.256487 NOTICE: [SQL arg] size=8MB, order=random, cmp=SQL-arg, test=2, time=0.256708 NOTICE: [SQL inlined] size=8MB, order=random, cmp=SQL-inline, test=0, time=0.192063 NOTICE: [SQL inlined] size=8MB, order=random, cmp=SQL-inline, test=1, time=0.191919 NOTICE: [SQL inlined] size=8MB, order=random, cmp=SQL-inline, test=2, time=0.192018 NOTICE: [SQL arg reverse] size=8MB, order=random, cmp=SQL-arg-rev, test=0, time=0.251948 NOTICE: [SQL arg reverse] size=8MB, order=random, cmp=SQL-arg-rev, test=1, time=0.251733 NOTICE: [SQL arg reverse] size=8MB, order=random, cmp=SQL-arg-rev, test=2, time=0.251913 NOTICE: [SQL inlined reverse] size=8MB, order=random, cmp=SQL-inline-rev, test=0, time=0.190797 NOTICE: [SQL inlined reverse] size=8MB, order=random, cmp=SQL-inline-rev, test=1, time=0.191047 NOTICE: [SQL inlined reverse] size=8MB, order=random, cmp=SQL-inline-rev, test=2, time=0.190807 NOTICE: [traditional qsort] size=8MB, order=increasing, cmp=arg, test=0, time=0.001608 NOTICE: [traditional qsort] size=8MB, order=increasing, cmp=arg, test=1, time=0.001608 NOTICE: [traditional qsort] size=8MB, order=increasing, cmp=arg, test=2, time=0.001607 NOTICE: [inlined] size=8MB, order=increasing, cmp=inline, test=0, time=0.000588 NOTICE: [inlined] size=8MB, order=increasing, cmp=inline, test=1, time=0.000585 NOTICE: [inlined] size=8MB, order=increasing, cmp=inline, test=2, time=0.000623 NOTICE: [branchless] size=8MB, order=increasing, cmp=branchless, test=0, time=0.000729 NOTICE: [branchless] size=8MB, order=increasing, cmp=branchless, test=1, time=0.001734 NOTICE: [branchless] size=8MB, order=increasing, cmp=branchless, test=2, time=0.000762 NOTICE: [SQL arg] size=8MB, order=increasing, cmp=SQL-arg, test=0, time=0.005885 NOTICE: [SQL arg] size=8MB, order=increasing, cmp=SQL-arg, test=1, time=0.005934 NOTICE: [SQL arg] size=8MB, order=increasing, cmp=SQL-arg, test=2, time=0.006892 NOTICE: [SQL inlined] size=8MB, order=increasing, cmp=SQL-inline, test=0, time=0.004524 NOTICE: [SQL inlined] size=8MB, order=increasing, cmp=SQL-inline, test=1, time=0.004509 NOTICE: [SQL inlined] size=8MB, order=increasing, cmp=SQL-inline, test=2, time=0.004523 NOTICE: [SQL arg reverse] size=8MB, order=increasing, cmp=SQL-arg-rev, test=0, time=0.160391 NOTICE: [SQL arg reverse] size=8MB, order=increasing, cmp=SQL-arg-rev, test=1, time=0.159284 NOTICE: [SQL arg reverse] size=8MB, order=increasing, cmp=SQL-arg-rev, test=2, time=0.159275 NOTICE: [SQL inlined reverse] size=8MB, order=increasing, cmp=SQL-inline-rev, test=0, time=0.127051 NOTICE: [SQL inlined reverse] size=8MB, order=increasing, cmp=SQL-inline-rev, test=1, time=0.126240 NOTICE: [SQL inlined reverse] size=8MB, order=increasing, cmp=SQL-inline-rev, test=2, time=0.125769 NOTICE: [traditional qsort] size=8MB, order=decreasing, cmp=arg, test=0, time=0.069454 NOTICE: [traditional qsort] size=8MB, order=decreasing, cmp=arg, test=1, time=0.068313 NOTICE: [traditional qsort] size=8MB, order=decreasing, cmp=arg, test=2, time=0.068701 NOTICE: [inlined] size=8MB, order=decreasing, cmp=inline, test=0, time=0.022330 NOTICE: [inlined] size=8MB, order=decreasing, cmp=inline, test=1, time=0.023942 NOTICE: [inlined] size=8MB, order=decreasing, cmp=inline, test=2, time=0.024963 NOTICE: [branchless] size=8MB, order=decreasing, cmp=branchless, test=0, time=0.036423 NOTICE: [branchless] size=8MB, order=decreasing, cmp=branchless, test=1, time=0.034867 NOTICE: [branchless] size=8MB, order=decreasing, cmp=branchless, test=2, time=0.035467 NOTICE: [SQL arg] size=8MB, order=decreasing, cmp=SQL-arg, test=0, time=0.166501 NOTICE: [SQL arg] size=8MB, order=decreasing, cmp=SQL-arg, test=1, time=0.165822 NOTICE: [SQL arg] size=8MB, order=decreasing, cmp=SQL-arg, test=2, time=0.165749 NOTICE: [SQL inlined] size=8MB, order=decreasing, cmp=SQL-inline, test=0, time=0.125182 NOTICE: [SQL inlined] size=8MB, order=decreasing, cmp=SQL-inline, test=1, time=0.126536 NOTICE: [SQL inlined] size=8MB, order=decreasing, cmp=SQL-inline, test=2, time=0.122951 NOTICE: [SQL arg reverse] size=8MB, order=decreasing, cmp=SQL-arg-rev, test=0, time=0.005721 NOTICE: [SQL arg reverse] size=8MB, order=decreasing, cmp=SQL-arg-rev, test=1, time=0.005725 NOTICE: [SQL arg reverse] size=8MB, order=decreasing, cmp=SQL-arg-rev, test=2, time=0.006157 NOTICE: [SQL inlined reverse] size=8MB, order=decreasing, cmp=SQL-inline-rev, test=0, time=0.004404 NOTICE: [SQL inlined reverse] size=8MB, order=decreasing, cmp=SQL-inline-rev, test=1, time=0.004415 NOTICE: [SQL inlined reverse] size=8MB, order=decreasing, cmp=SQL-inline-rev, test=2, time=0.006027
diff --git a/src/test/modules/test_sort_perf/Makefile b/src/test/modules/test_sort_perf/Makefile index ab7f8e24b0..e70187f3a6 100644 --- a/src/test/modules/test_sort_perf/Makefile +++ b/src/test/modules/test_sort_perf/Makefile @@ -6,12 +6,15 @@ DATA = test_sort_perf--1.0.sql first: all -test_sort_perf.o: test_sort_object_include.c test_sort_itemptr_include.c +#test_sort_perf.o: test_sort_object_include.c test_sort_itemptr_include.c test_sort_int64_include.c +test_sort_perf.o: test_sort_int64_include.c -test_sort_object_include.c: make-object-tests.sh - ./make-object-tests.sh > test_sort_object_include.c -test_sort_itemptr_include.c: make-itemptr-tests.sh - ./make-itemptr-tests.sh > test_sort_itemptr_include.c +# test_sort_object_include.c: make-object-tests.sh +# ./make-object-tests.sh > test_sort_object_include.c +# test_sort_itemptr_include.c: make-itemptr-tests.sh +# ./make-itemptr-tests.sh > test_sort_itemptr_include.c +test_sort_int64_include.c: make-int32-tests.sh + ./make-int32-tests.sh > test_sort_int32_include.c ifdef USE_PGXS PG_CONFIG = pg_config diff --git a/src/test/modules/test_sort_perf/make-int32-tests.sh b/src/test/modules/test_sort_perf/make-int32-tests.sh new file mode 100755 index 0000000000..a861ef608a --- /dev/null +++ b/src/test/modules/test_sort_perf/make-int32-tests.sh @@ -0,0 +1,151 @@ +#!/bin/sh + +cat <<EOF +#include "utils/builtins.h" + +EOF + +# generate the function that runs all the tests +echo "static void" +echo "do_sort_int32(int nmegs)" +echo "{" +echo " size_t nobjects = (nmegs * 1024 * 1024) / sizeof(Datum);" +echo " Datum *unsorted = malloc(sizeof(Datum) * nobjects);" +echo " Datum *sorted = malloc(sizeof(Datum) * nobjects);" +echo + +# for btree +cat <<EOF + + BTSortArrayContext cxt; + RegProcedure cmp_proc ; + + // to keep from pulling in nbtree.h +#define BTORDER_PROC 1 + + cmp_proc = get_opfamily_proc(INTEGER_BTREE_FAM_OID, + INT4OID, + INT4OID, + BTORDER_PROC); + + fmgr_info(cmp_proc, &cxt.flinfo); + cxt.collation = DEFAULT_COLLATION_OID; + // we set cxt.reverse later + +EOF + +for order in random increasing decreasing ; do + echo " for (size_t i = 0; i < nobjects; ++i)" + echo " {" + if [ "$order" = "random" ] ; then + echo " int v = random();" + elif [ "$order" = "increasing" ] ; then + echo " int v = i + 16;" + elif [ "$order" = "decreasing" ] ; then + echo " int v = INT_MAX - i;" + fi + echo " unsorted[i] = Int32GetDatum(v);" +echo " }" +echo + +# traditional qsort +echo " for (int i = 0; i < 3; ++i)" +echo " {" +echo " instr_time start_time, end_time;" +echo " memcpy(sorted, unsorted, sizeof(Datum) * nobjects);" +echo " INSTR_TIME_SET_CURRENT(start_time);" +echo " qsort(sorted, nobjects, sizeof(Datum), btint4fastcmp);" +echo " INSTR_TIME_SET_CURRENT(end_time);" +echo " INSTR_TIME_SUBTRACT(end_time, start_time);" +echo " elog(NOTICE, \"[traditional qsort] size=%dMB, order=$order, cmp=arg, test=%d, time=%f\", nmegs, i, INSTR_TIME_GET_DOUBLE(end_time));" +echo " }" + +# inlined +echo " for (int i = 0; i < 3; ++i)" +echo " {" +echo " instr_time start_time, end_time;" +echo " memcpy(sorted, unsorted, sizeof(Datum) * nobjects);" +echo " INSTR_TIME_SET_CURRENT(start_time);" +echo " qsort_int32(sorted, nobjects);" +echo " INSTR_TIME_SET_CURRENT(end_time);" +echo " INSTR_TIME_SUBTRACT(end_time, start_time);" +echo " elog(NOTICE, \"[inlined] size=%dMB, order=$order, cmp=inline, test=%d, time=%f\", nmegs, i, INSTR_TIME_GET_DOUBLE(end_time));" +echo " }" + +# inlined and branchless +echo " for (int i = 0; i < 3; ++i)" +echo " {" +echo " instr_time start_time, end_time;" +echo " memcpy(sorted, unsorted, sizeof(Datum) * nobjects);" +echo " INSTR_TIME_SET_CURRENT(start_time);" +echo " qsort_int32_branchless(sorted, nobjects);" +echo " INSTR_TIME_SET_CURRENT(end_time);" +echo " INSTR_TIME_SUBTRACT(end_time, start_time);" +echo " elog(NOTICE, \"[branchless] size=%dMB, order=$order, cmp=branchless, test=%d, time=%f\", nmegs, i, INSTR_TIME_GET_DOUBLE(end_time));" +echo " }" + +### B-tree forward + +echo "cxt.reverse = false;" + +# B-tree SQL-callable qsort arg +echo " for (int i = 0; i < 3; ++i)" +echo " {" +echo " instr_time start_time, end_time;" +echo " memcpy(sorted, unsorted, sizeof(Datum) * nobjects);" +echo " INSTR_TIME_SET_CURRENT(start_time);" +echo " qsort_arg((void *) sorted, nobjects, sizeof(Datum), _bt_compare_array_elements, (void *) &cxt);" +echo " INSTR_TIME_SET_CURRENT(end_time);" +echo " INSTR_TIME_SUBTRACT(end_time, start_time);" +echo " elog(NOTICE, \"[SQL arg] size=%dMB, order=$order, cmp=SQL-arg, test=%d, time=%f\", nmegs, i, INSTR_TIME_GET_DOUBLE(end_time));" +echo " }" + +# B-tree SQL-callable inlined +echo " for (int i = 0; i < 3; ++i)" +echo " {" +echo " instr_time start_time, end_time;" +echo " memcpy(sorted, unsorted, sizeof(Datum) * nobjects);" +echo " INSTR_TIME_SET_CURRENT(start_time);" +echo " qsort_bt_array_elements(sorted, nobjects, &cxt);" +echo " INSTR_TIME_SET_CURRENT(end_time);" +echo " INSTR_TIME_SUBTRACT(end_time, start_time);" +echo " elog(NOTICE, \"[SQL inlined] size=%dMB, order=$order, cmp=SQL-inline, test=%d, time=%f\", nmegs, i, INSTR_TIME_GET_DOUBLE(end_time));" +echo " }" +echo + +### Btree reversed + +echo "cxt.reverse = true;" + +# B-tree SQL-callable qsort arg (same as above, but with reversed comparator) +echo " for (int i = 0; i < 3; ++i)" +echo " {" +echo " instr_time start_time, end_time;" +echo " memcpy(sorted, unsorted, sizeof(Datum) * nobjects);" +echo " INSTR_TIME_SET_CURRENT(start_time);" +echo " qsort_arg((void *) sorted, nobjects, sizeof(Datum), _bt_compare_array_elements, (void *) &cxt);" +echo " INSTR_TIME_SET_CURRENT(end_time);" +echo " INSTR_TIME_SUBTRACT(end_time, start_time);" +echo " elog(NOTICE, \"[SQL arg reverse] size=%dMB, order=$order, cmp=SQL-arg-rev, test=%d, time=%f\", nmegs, i, INSTR_TIME_GET_DOUBLE(end_time));" +echo " }" + +# B-tree SQL-callable inlined reversed +echo " for (int i = 0; i < 3; ++i)" +echo " {" +echo " instr_time start_time, end_time;" +echo " memcpy(sorted, unsorted, sizeof(Datum) * nobjects);" +echo " INSTR_TIME_SET_CURRENT(start_time);" +echo " qsort_bt_array_elements_reverse(sorted, nobjects, &cxt);" +echo " INSTR_TIME_SET_CURRENT(end_time);" +echo " INSTR_TIME_SUBTRACT(end_time, start_time);" +echo " elog(NOTICE, \"[SQL inlined reverse] size=%dMB, order=$order, cmp=SQL-inline-rev, test=%d, time=%f\", nmegs, i, INSTR_TIME_GET_DOUBLE(end_time));" +echo " }" +echo + + + +done; + +echo " free(sorted);" +echo " free(unsorted);" +echo "}" diff --git a/src/test/modules/test_sort_perf/test_sort_perf--1.0.sql b/src/test/modules/test_sort_perf/test_sort_perf--1.0.sql index 953717f8ba..15203d9382 100644 --- a/src/test/modules/test_sort_perf/test_sort_perf--1.0.sql +++ b/src/test/modules/test_sort_perf/test_sort_perf--1.0.sql @@ -1,2 +1,3 @@ -CREATE FUNCTION test_sort_object() RETURNS void AS 'MODULE_PATHNAME' LANGUAGE C; -CREATE FUNCTION test_sort_itemptr() RETURNS void AS 'MODULE_PATHNAME' LANGUAGE C; +--CREATE FUNCTION test_sort_object() RETURNS void AS 'MODULE_PATHNAME' LANGUAGE C; +--CREATE FUNCTION test_sort_itemptr() RETURNS void AS 'MODULE_PATHNAME' LANGUAGE C; +CREATE FUNCTION test_sort_int32(int4) RETURNS void AS 'MODULE_PATHNAME' LANGUAGE C; diff --git a/src/test/modules/test_sort_perf/test_sort_perf.c b/src/test/modules/test_sort_perf/test_sort_perf.c index 05a10924f3..177d2b1c35 100644 --- a/src/test/modules/test_sort_perf/test_sort_perf.c +++ b/src/test/modules/test_sort_perf/test_sort_perf.c @@ -2,12 +2,18 @@ #include "funcapi.h" #include "catalog/index.h" +#include "catalog/pg_collation.h" +#include "catalog/pg_type_d.h" +#include "catalog/pg_opfamily_d.h" #include "miscadmin.h" #include "portability/instr_time.h" #include "storage/itemptr.h" +#include "utils/lsyscache.h" #include <stdlib.h> + +/* static int itemptr_comparator(const void *a, const void *b) { @@ -27,14 +33,106 @@ itemptr_comparator(const void *a, const void *b) if (oa > ob) return 1; return 0; +}*/ + + +// standard comparators + +// comparator for qsort_arg() +// XXX rewritten from the version in nbtutils.c +static int +btint4fastcmp(const void * x, const void * y) +{ + int32 *a = (int32 *) x; + int32 *b = (int32 *) y; + + if (*a > *b) + return 1; + else if (*a == *b) + return 0; + else + return -1; } +// qsort with inlined comparator +#define ST_SORT qsort_int32 +#define ST_ELEMENT_TYPE Datum +#define ST_COMPARE(a, b) (btint4fastcmp(a, b)) +#define ST_SCOPE static +#define ST_DEFINE +#define ST_DECLARE +#include "lib/sort_template.h" + +// qsort with branchless comparator +#define ST_SORT qsort_int32_branchless +#define ST_ELEMENT_TYPE Datum +#define ST_COMPARE(a, b) ((int64) *(a) - (int64) *(b)) +#define ST_COMPARE_RET_TYPE int64 +#define ST_SCOPE static +#define ST_DEFINE +#define ST_DECLARE +#include "lib/sort_template.h" + + +// SQL-callable comparators + +typedef struct BTSortArrayContext +{ + FmgrInfo flinfo; + Oid collation; + bool reverse; +} BTSortArrayContext; + +// comparator for qsort arg +static int +_bt_compare_array_elements(const void *a, const void *b, void *arg) +{ + Datum da = *((const Datum *) a); + Datum db = *((const Datum *) b); + BTSortArrayContext *cxt = (BTSortArrayContext *) arg; + int32 compare; + + compare = DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, + cxt->collation, + da, db)); + if (cxt->reverse) + INVERT_COMPARE_RESULT(compare); + return compare; +} + + +/* Define a specialized sort function for _bt_sort_array_elements. */ +#define ST_SORT qsort_bt_array_elements +//#define ST_UNIQUE unique_bt_array_elements +#define ST_ELEMENT_TYPE Datum +#define ST_COMPARE(a, b, cxt) \ + DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b)) +#define ST_COMPARE_ARG_TYPE BTSortArrayContext +#define ST_SCOPE static +#define ST_DEFINE +#include "lib/sort_template.h" + +// inline the reversal +#define ST_SORT qsort_bt_array_elements_reverse +//#define ST_UNIQUE unique_bt_array_elements_reverse +#define ST_ELEMENT_TYPE Datum +#define ST_COMPARE(a, b, cxt) \ + (0 - (DatumGetInt32(FunctionCall2Coll(&cxt->flinfo, cxt->collation, *a, *b)))) +#define ST_COMPARE_RET_TYPE int64 +#define ST_COMPARE_ARG_TYPE BTSortArrayContext +#define ST_SCOPE static +#define ST_DEFINE +#include "lib/sort_template.h" + + PG_MODULE_MAGIC; /* include the generated code */ -#include "test_sort_object_include.c" -#include "test_sort_itemptr_include.c" +// #include "test_sort_object_include.c" +// #include "test_sort_itemptr_include.c" +#include "test_sort_int32_include.c" +/* PG_FUNCTION_INFO_V1(test_sort_object); PG_FUNCTION_INFO_V1(test_sort_itemptr); @@ -53,3 +151,15 @@ test_sort_itemptr(PG_FUNCTION_ARGS) PG_RETURN_NULL(); } +*/ + + +PG_FUNCTION_INFO_V1(test_sort_int32); +Datum +test_sort_int32(PG_FUNCTION_ARGS) +{ + const int32 nmegs = PG_GETARG_INT32(0); + + do_sort_int32(nmegs); + PG_RETURN_NULL(); +}