On Fri, Jul 2, 2021 at 4:39 AM John Naylor <john.nay...@enterprisedb.com> wrote: > For item pointers, it made sense to try doing math to reduce the number of > branches. That made things worse, so let's try the opposite: Increase the > number of branches so we do less math. In the attached patch (applies on top > of your 0012 and a .txt to avoid confusing the CF bot), I test a new > comparator with this approach, and also try a wider range of thresholds. The > thresholds don't seem to make any noticeable difference with this data type, > but the new comparator (cmp=ids below) gives a nice speedup in this test:
> NOTICE: [traditional qsort] order=random, threshold=7, cmp=32, test=0, > time=4.964657 > NOTICE: order=random, threshold=7, cmp=std, test=0, time=2.810627 > NOTICE: order=random, threshold=7, cmp=ids, test=0, time=1.692711 Oooh. So, the awkwardness of the 64 maths with unaligned inputs (even though we obtain all inputs with 16 bit loads) was hurting, and you realised the same sort of thing might be happening also with the 32 bit version and went the other way. (It'd be nice to understand exactly why.) I tried your 16 bit comparison version on Intel, AMD and Apple CPUs and the results were all in the same ballpark. For random input, I see something like ~1.7x speedup over traditional qsort from specialising (cmp=std), and ~2.7x from going 16 bit (cmp=ids). For increasing and decreasing input, it's ~2x speedup from specialising and ~4x speedup from going 16 bit. Beautiful. One thing I'm wondering about is whether it's worth having stuff to support future experimentation like ST_SORT_SMALL_THRESHOLD and ST_COMPARE_RET_TYPE in the tree, or whether we should pare it back to the minimal changes that definitely produce results. I think I'd like to keep those changes: even if it may be some time, possibly an infinite amount, before we figure out how to tune the thresholds profitably, giving them names instead of using magic numbers seems like progress. The Alexandrescu talk was extremely entertaining, thanks.