Hi Peter, > Attached is a POC patch that adds skip scan to nbtree. The patch > teaches nbtree index scans to efficiently use a composite index on > '(a, b)' for queries with a predicate such as "WHERE b = 5". This is > feasible in cases where the total number of distinct values in the > column 'a' is reasonably small (think tens or hundreds, perhaps even > thousands for very large composite indexes). > > [...] > > Thoughts?
Many thanks for working on this. I believe it is an important feature and it would be great to deliver it during the PG18 cycle. I experimented with the patch and here are the results I got so far. Firstly, it was compiled on Intel MacOS and ARM Linux. All the tests pass just fine. Secondly, I tested the patch manually using a release build on my Raspberry Pi 5 and the GUCs that can be seen in [1]. Test 1 - simple one. ``` CREATE TABLE test1(c char, n bigint); CREATE INDEX test1_idx ON test1 USING btree(c,n); INSERT INTO test1 SELECT chr(ascii('a') + random(0,2)) AS c, random(0, 1_000_000_000) AS n FROM generate_series(0, 1_000_000); EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test1 WHERE n > 900_000_000; ``` Test 2 - a more complicated one. ``` CREATE TABLE test2(c1 char, c2 char, n bigint); CREATE INDEX test2_idx ON test2 USING btree(c1,c2,n); INSERT INTO test2 SELECT chr(ascii('a') + random(0,2)) AS c1, chr(ascii('a') + random(0,2)) AS c2, random(0, 1_000_000_000) AS n FROM generate_series(0, 1_000_000); EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test2 WHERE n > 900_000_000; ``` Test 3 - to see how it works with covering indexes. ``` CREATE TABLE test3(c char, n bigint, s text DEFAULT 'text_value' || n); CREATE INDEX test3_idx ON test3 USING btree(c,n) INCLUDE(s); INSERT INTO test3 SELECT chr(ascii('a') + random(0,2)) AS c, random(0, 1_000_000_000) AS n, 'text_value_' || random(0, 1_000_000_000) AS s FROM generate_series(0, 1_000_000); EXPLAIN [ANALYZE] SELECT s FROM test3 WHERE n < 1000; ``` In all the cases the patch worked as expected. I noticed that with the patch we choose Index Only Scans for Test 1 and without the patch - Parallel Seq Scan. However the Parallel Seq Scan is 2.4 times faster. Before the patch the query takes 53 ms, after the patch - 127 ms. I realize this could be just something specific to my hardware and/or amount of data. Do you think this is something that was expected or something worth investigating further? I haven't looked at the code yet. [1]: https://github.com/afiskon/pgscripts/blob/master/single-install-meson.sh -- Best regards, Aleksander Alekseev