On Tue, Dec 22, 2015 at 10:49 AM, Peter Geoghegan <p...@heroku.com> wrote: > On Tue, Dec 22, 2015 at 9:31 AM, Robert Haas <robertmh...@gmail.com> wrote: >>> It has some nice properties, because unsigned integers are so simple >>> and flexible. For example, we can do it with int4 sometimes, and then >>> compare two attributes at once on 64-bit platforms. Maybe the second >>> attribute (the second set of 4 bytes) isn't an int4, though -- it's >>> any other type's abbreviated key (which can be assumed to have a >>> compatible representation). That's one more advanced possibility. >> >> Yikes. > > I'm not suggesting we'd want to do that immediately, or even at all. > My point is -- stuff like this becomes possible. My intuition is that > it might come up in a bunch of places.
I had a better idea, that is based on the same intuition that we can leverage the flexibility of that standardized representation of abbreviated keys to the hilt. We use one bit to represent whether this is the first run, or a subsequent run (in a world where replacement selection is only used for "quicksort with spillover", and run number isn't that interesting), and another bit to represent NULLness. These are the most significant and second most significant bits, respectively. Fill all remaining bits with your unsigned abbreviated key representation, without regard to the underlying details of the type. Now, you just found a way to cut the size of SortTuples significantly, to just two pointers (whereas with alignment considerations, SortTuples currently consume enough memory to store 3 pointers). You also just found a way of significantly cutting the number of instructions that are needed for the hot code path, because you don't really need an abbreviation-based ApplySortComparator() inline function to interpret things through -- whether or not NULLs are first or last, and how NULLs are considered generally is all baked into the abbreviated representation from the start. This is certainly speculative stuff, and having multiple versions of SortTuple would be inconvenient, but the point again is that there are considerable upsides to a standardized representation (abbreviated keys always being compared as unsigned integers), and no downsides. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers