Hi list, I've attached a patch that implements SortSupport for the inet/cidr types. It has the effect of typically reducing the time taken to sort these types by ~50-60% (as measured by `SELECT COUNT(DISTINCT ...)` which will carry over to common operations like index creation, `ORDER BY`, and `DISTINCT`.
As with other SortSupport implementations, this one proposes an abbreviated key design that packs in as much sorting-relevant information into the available datum as possible while remaining faithful to the existing sorting rules for the types. The key format depends on datum size and whether working with IPv4 or IPv6. In most cases that involves storing as many netmask bits as we can fit, but we can do something a little more complete with IPv4 on 64-bit — because inet data is small and the datum is large, we can store enough information for a successful key-only comparison in the majority of cases. Precise details including background and bit-level diagrams are included in the comments of the attached patch. I've tried to take a variety of steps to build confidence that the proposed SortSupport-based sort is correct: 1. It passes the existing inet regression suite (which was pretty complete already). 2. I added a few new test cases the suite, specifically trying to target edge cases like the minimums and maximums of each abbreviated key bit "boundary". The new cases were run against the master branch to double-check that they're right. 3. I've added a variety of assertions throughout the implementation to make sure that each step is seeing inputs with expected parameters, and only manipulates the bits that it's supposed to be manipulating. 4. I built large random data sets (10M rows) and sorted them against a development build to try and trigger the aforementioned assertions. 5. I built an index on 10M values and ran amcheck against it. 6. I wrote some unit tests to verify that the bit-level representation of the new abbreviated keys are indeed what we expect. They're available here [3]. I didn't include them in the patch because I couldn't find an easy way to produce an expected `.out` file for a 32-bit machine (experiments building with `setarch` didn't succeed). They might be overkill anyway. I can continue to pursue getting them working if reviewers think they'd be useful. My benchmarking methodology and script is available here [1], and involves gathering statistics for 100 `count(distinct ...)` queries at various data sizes. I've saved the results I got on my machine here [2]. Hat tip to Peter Geoghegan for advising on an appropriate abbreviated key design and helping answer many questions along the way. Brandur
SortSupport-for-inet-cidr-types-v1.patch
Description: Binary data