Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Masahiro Ikeda Tue, 19 Nov 2024 00:31:22 -0800

Hi,

Apologies for the delayed response. I've confirmed that the costing issignificantly

improved for multicolumn indexes in the case I provided. Thanks!
https://www.postgresql.org/message-id/TYWPR01MB10982A413E0EC4088E78C0E11B1A62%40TYWPR01MB10982.jpnprd01.prod.outlook.com


I have some comments and questions regarding the v14 patch.

(1)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch


As a user, I agree with adding a counter for skip scans
(though it might be better to reply to the different thread).

I believe this will help users determine whether the skip scanoptimization iseffective. If the ratio of (actual rows)/(index searches) becomes small,itsuggests that many traversals are occurring, indicating that the skipscanoptimization is not working efficiently. In such cases, users mightbenefitfrom replacing the multi-column index with an optimized leading columnindex.


IIUC, why not add it to the documentation? It would clearly help users
understand how to tune their queries using the counter, and it would
also show that the counter is not just for developers.

(2)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch

From the perspective of consistency, wouldn't it be better to align thenamingbetween the EXPLAIN output and pg_stat_all_indexes.idx_scan, even thoughthe

documentation states they refer to the same concept?

I personally prefer something like "search" instead of "scan", as "scan"iscommonly associated with node names like Index Scan and similar terms.To maintain

consistency, how about renaming pg_stat_all_indexes.idx_scan to
pg_stat_all_indexes.idx_search?

(3)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch


The counter should be added in blgetbitmap().

(4)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch
doc/src/sgml/bloom.sgml


The below forgot "Index Searches: 1".

-> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04rows=500 width=0) (never executed)

               Index Cond: (i2 = 898732)
 Planning Time: 0.491 ms
 Execution Time: 0.055 ms
(10 rows)

Although we may not need to fix it, due to the support for skip scan,the B-tree

index is now selected over the Bloom index in my environment.

(5)

v14-0002-Add-skip-scan-to-nbtree.patch

Although I tested with various data types such as int, uuid, oid, andothers on mylocal PC, I could only identify the regression case that you alreadymentioned.

The case is that Index(Only)Scan is selected instead of SeqScan.Although the performanceof the full index scan is almost the same as that of SeqScan, the skipscan degrades bya factor of 4, despite the number of accessed pages being almost thesame. The flamegraphindicates that the ratio of _bt_advance_array_keys() and_bt_tuple_before_array_skeys()

becomes high.

Although it's not an optimal solution and would only reduce the degreeof performancedegradation, how about introducing a threshold per page to switch fromskip scan to fullindex scan? For example, switch to full index scan after_bt_tuple_before_array_skeys() in_bt_checkkeys() fails 30 times per page. While 30 is just an example, Ireferenced thefollowing case, which seems close to the worst-case scenario. In thatcase,_bt_advance_array_keys() and _bt_tuple_before_array_skeys() are calledalmost 333 times perpage (1,000,000 records / 2,636 pages), and 30 is about 1/10 of thatnumber.


# Example case (see test.sql)
# master: 51.612ms
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------

Gather (cost=1000.00..10633.43 rows=1 width=8) (actualtime=0.160..51.612 rows=1 loops=1)

   Output: id1, id2
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4425

-> Parallel Seq Scan on public.t (cost=0.00..9633.33 rows=1width=8) (actual time=31.728..48.378 rows=0 loops=3)

         Output: id1, id2
         Filter: (t.id2 = 100)
         Rows Removed by Filter: 333333
         Buffers: shared hit=4425
         Worker 0:  actual time=47.583..47.583 rows=0 loops=1
           Buffers: shared hit=1415
         Worker 1:  actual time=47.568..47.569 rows=0 loops=1
           Buffers: shared hit=1436
 Settings: effective_cache_size = '7932MB', work_mem = '15MB'
 Planning Time: 0.078 ms
 Execution Time: 51.630 ms
(17 rows)

# master with v14patch: 199.328ms. 4x slower
SET skipscan_prefix_cols=32;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------

Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1width=8) (actual time=0.090..199.328 rows=1 loops=1)

   Output: id1, id2
   Index Cond: (t.id2 = 100)
   Index Searches: 1
   Heap Fetches: 0
   Buffers: shared hit=2736
 Settings: effective_cache_size = '7932MB', work_mem = '15MB'
 Planning Time: 0.069 ms
 Execution Time: 199.348 ms
(9 rows)

# master with v14patch: 54.665ms.
SET skipscan_prefix_cols=0;  // disable skip scan
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------

Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1width=8) (actual time=0.031..54.665 rows=1 loops=1)

   Output: id1, id2
   Index Cond: (t.id2 = 100)
   Index Searches: 1
   Heap Fetches: 0
   Buffers: shared hit=2736
 Settings: effective_cache_size = '7932MB', work_mem = '15MB'
 Planning Time: 0.067 ms
 Execution Time: 54.685 ms
(9 rows)


(6)

v14-0002-Add-skip-scan-to-nbtree.patch
src/backend/access/nbtree/nbtutils.c


+static int

+_bt_preprocess_num_array_keys(IndexScanDesc scan, BTSkipPreproc*skipatts,

+                                                         int *numSkipArrayKeys)
(snip)
+               int                     prev_numSkipArrayKeys = 
*numSkipArrayKeys;
+
+               /*
+                * Backfill skip arrays for any wholly omitted attributes prior 
to
+                * attno_inkey
+                */
+               while (attno_skip < attno_inkey)

Is it better to move prev_numSkipArrayKeys =*numSkipArrayKeys after thewhile loop?For example, the index below should return *numSkipArrayKeys = 0 insteadof 1

if the id3 type does not support eq_op.

* index: CREATE INDEX test_idx on TEST (id1 int, id2 int, id3 no_eq_op,id4 int);

* query: SELECT * FROM test WHERE id4 = 10;


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION

DROP TABLE IF EXISTS t;
CREATE TABLE t (id1 int, id2 int);
-- 5x fast
-- INSERT INTO t (SELECT 1, i%101 FROM generate_series(1,1_000_000) s(i)); 
-- 4x slow
INSERT INTO t (SELECT i, i FROM generate_series(1,1_000_000) s(i));
CREATE INDEX t_idx on t (id1, id2);
VACUUM FREEZE ANALYZE;
SET skipscan_prefix_cols=32;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * FROM t 
WHERE id2 = 100; -- cache
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * FROM t 
WHERE id2 = 100;
SET skipscan_prefix_cols=0;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * FROM t 
WHERE id2 = 100;
DROP INDEX t_idx;

Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Reply via email to