Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Aleksander Alekseev Tue, 02 Jul 2024 05:53:32 -0700

Hi Peter,

> Attached is a POC patch that adds skip scan to nbtree. The patch
> teaches nbtree index scans to efficiently use a composite index on
> '(a, b)' for queries with a predicate such as "WHERE b = 5". This is
> feasible in cases where the total number of distinct values in the
> column 'a' is reasonably small (think tens or hundreds, perhaps even
> thousands for very large composite indexes).
>
> [...]
>
> Thoughts?


Many thanks for working on this. I believe it is an important feature
and it would be great to deliver it during the PG18 cycle.

I experimented with the patch and here are the results I got so far.

Firstly, it was compiled on Intel MacOS and ARM Linux. All the tests
pass just fine.

Secondly, I tested the patch manually using a release build on my
Raspberry Pi 5 and the GUCs that can be seen in [1].

Test 1 - simple one.

```
CREATE TABLE test1(c char, n bigint);
CREATE INDEX test1_idx ON test1 USING btree(c,n);

INSERT INTO test1
  SELECT chr(ascii('a') + random(0,2)) AS c,
         random(0, 1_000_000_000) AS n
  FROM generate_series(0, 1_000_000);

EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test1 WHERE n > 900_000_000;
```

Test 2 - a more complicated one.

```
CREATE TABLE test2(c1 char, c2 char, n bigint);
CREATE INDEX test2_idx ON test2 USING btree(c1,c2,n);

INSERT INTO test2
  SELECT chr(ascii('a') + random(0,2)) AS c1,
         chr(ascii('a') + random(0,2)) AS c2,
         random(0, 1_000_000_000) AS n
  FROM generate_series(0, 1_000_000);

EXPLAIN [ANALYZE] SELECT COUNT(*) FROM test2 WHERE n > 900_000_000;
```

Test 3 - to see how it works with covering indexes.

```
CREATE TABLE test3(c char, n bigint, s text DEFAULT 'text_value' || n);
CREATE INDEX test3_idx ON test3 USING btree(c,n) INCLUDE(s);

INSERT INTO test3
  SELECT chr(ascii('a') + random(0,2)) AS c,
         random(0, 1_000_000_000) AS n,
         'text_value_' || random(0, 1_000_000_000) AS s
  FROM generate_series(0, 1_000_000);

EXPLAIN [ANALYZE] SELECT s FROM test3 WHERE n < 1000;
```

In all the cases the patch worked as expected.

I noticed that with the patch we choose Index Only Scans for Test 1
and without the patch - Parallel Seq Scan. However the Parallel Seq
Scan is 2.4 times faster. Before the patch the query takes 53 ms,
after the patch - 127 ms. I realize this could be just something
specific to my hardware and/or amount of data.

Do you think this is something that was expected or something worth
investigating further?

I haven't looked at the code yet.

[1]: https://github.com/afiskon/pgscripts/blob/master/single-install-meson.sh

-- 
Best regards,
Aleksander Alekseev

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Reply via email to