> On 2 Dec 2024, at 08:39, John Naylor <johncnaylo...@gmail.com> wrote: > > On Mon, Dec 2, 2024 at 1:12 AM Andrey M. Borodin <x4...@yandex-team.ru> wrote: >> >>> On 25 Nov 2024, at 17:50, John Naylor <johncnaylo...@gmail.com> wrote: >>> >>> I'd like to see the two sort specializations combined >>> into one, using a local comparator that knows when to reverse the >>> comparison result (hope that makes sense). >> >> Sure, please find attached. >> The prototype looks somewhat ugly (we pass bool* ascending instead of bool >> ascending) and it cost us ~2% of performance (on my MacBook Air M3). > > I haven't tried to reproduce, but the comparison function has a > different style for DESC than the tuplesort comparators, and the style > here has worse code density, at least in isolation. You can see the > difference in the link below. I also found a way to make the cmp > reversal branch-free (last example). That may not survive once it's > inlined, of course, or could make things worse, but you can try these > if you like: > > https://godbolt.org/z/nfPMT7Enr
On my machine this test \timing on SELECT (sort(arr))[1] FROM arrays_to_sort;\watch 0 c=5 produces sort_int32_cmp Time: 543.690 ms sort_int32_cmp_2 Time: 609.019 ms sort_int32_cmp_4 Time: 612.219 ms So, I'd stick with sort_int32_cmp. But, perhaps, on Intel we might have different results. > >> But is not more generic. > > A lot of the churn is from v1, and made worse by v2, and that seems to > be from getting rid of the QSORT macro: > > @@ -227,9 +228,10 @@ Datum > sort_asc(PG_FUNCTION_ARGS) > { > ArrayType *a = PG_GETARG_ARRAYTYPE_P_COPY(0); > + bool ascending = true; > > CHECKARRVALID(a); > - QSORT(a, 1); > + sort_int32(ARRPTR(a), ARRNELEMS(a), &ascending); > PG_RETURN_POINTER(a); > } > > The macro hides some details -- can we put "ascending" inside there? Done. Best regards, Andrey Borodin.
v3-0001-Use-specialized-sort-facilities.patch
Description: Binary data