Hi,

Apologies for the delayed response. I've confirmed that the costing is significantly
improved for multicolumn indexes in the case I provided. Thanks!
https://www.postgresql.org/message-id/TYWPR01MB10982A413E0EC4088E78C0E11B1A62%40TYWPR01MB10982.jpnprd01.prod.outlook.com

I have some comments and questions regarding the v14 patch.

(1)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch

As a user, I agree with adding a counter for skip scans
(though it might be better to reply to the different thread).

I believe this will help users determine whether the skip scan optimization is effective. If the ratio of (actual rows)/(index searches) becomes small, it suggests that many traversals are occurring, indicating that the skip scan optimization is not working efficiently. In such cases, users might benefit from replacing the multi-column index with an optimized leading column index.

IIUC, why not add it to the documentation? It would clearly help users
understand how to tune their queries using the counter, and it would
also show that the counter is not just for developers.

(2)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch

From the perspective of consistency, wouldn't it be better to align the naming between the EXPLAIN output and pg_stat_all_indexes.idx_scan, even though the
documentation states they refer to the same concept?

I personally prefer something like "search" instead of "scan", as "scan" is commonly associated with node names like Index Scan and similar terms. To maintain
consistency, how about renaming pg_stat_all_indexes.idx_scan to
pg_stat_all_indexes.idx_search?

(3)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch

The counter should be added in blgetbitmap().

(4)

v14-0001-Show-index-search-count-in-EXPLAIN-ANALYZE.patch
doc/src/sgml/bloom.sgml

The below forgot "Index Searches: 1".

-> Bitmap Index Scan on btreeidx2 (cost=0.00..12.04 rows=500 width=0) (never executed)
               Index Cond: (i2 = 898732)
 Planning Time: 0.491 ms
 Execution Time: 0.055 ms
(10 rows)

Although we may not need to fix it, due to the support for skip scan, the B-tree
index is now selected over the Bloom index in my environment.

(5)

v14-0002-Add-skip-scan-to-nbtree.patch

Although I tested with various data types such as int, uuid, oid, and others on my local PC, I could only identify the regression case that you already mentioned.

The case is that Index(Only)Scan is selected instead of SeqScan. Although the performance of the full index scan is almost the same as that of SeqScan, the skip scan degrades by a factor of 4, despite the number of accessed pages being almost the same. The flamegraph indicates that the ratio of _bt_advance_array_keys() and _bt_tuple_before_array_skeys()
becomes high.

Although it's not an optimal solution and would only reduce the degree of performance degradation, how about introducing a threshold per page to switch from skip scan to full index scan? For example, switch to full index scan after _bt_tuple_before_array_skeys() in _bt_checkkeys() fails 30 times per page. While 30 is just an example, I referenced the following case, which seems close to the worst-case scenario. In that case, _bt_advance_array_keys() and _bt_tuple_before_array_skeys() are called almost 333 times per page (1,000,000 records / 2,636 pages), and 30 is about 1/10 of that number.

# Example case (see test.sql)
# master: 51.612ms
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Gather (cost=1000.00..10633.43 rows=1 width=8) (actual time=0.160..51.612 rows=1 loops=1)
   Output: id1, id2
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4425
-> Parallel Seq Scan on public.t (cost=0.00..9633.33 rows=1 width=8) (actual time=31.728..48.378 rows=0 loops=3)
         Output: id1, id2
         Filter: (t.id2 = 100)
         Rows Removed by Filter: 333333
         Buffers: shared hit=4425
         Worker 0:  actual time=47.583..47.583 rows=0 loops=1
           Buffers: shared hit=1415
         Worker 1:  actual time=47.568..47.569 rows=0 loops=1
           Buffers: shared hit=1436
 Settings: effective_cache_size = '7932MB', work_mem = '15MB'
 Planning Time: 0.078 ms
 Execution Time: 51.630 ms
(17 rows)

# master with v14patch: 199.328ms. 4x slower
SET skipscan_prefix_cols=32;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1 width=8) (actual time=0.090..199.328 rows=1 loops=1)
   Output: id1, id2
   Index Cond: (t.id2 = 100)
   Index Searches: 1
   Heap Fetches: 0
   Buffers: shared hit=2736
 Settings: effective_cache_size = '7932MB', work_mem = '15MB'
 Planning Time: 0.069 ms
 Execution Time: 199.348 ms
(9 rows)

# master with v14patch: 54.665ms.
SET skipscan_prefix_cols=0;  // disable skip scan
                                                       QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------
Index Only Scan using t_idx on public.t (cost=0.42..3535.75 rows=1 width=8) (actual time=0.031..54.665 rows=1 loops=1)
   Output: id1, id2
   Index Cond: (t.id2 = 100)
   Index Searches: 1
   Heap Fetches: 0
   Buffers: shared hit=2736
 Settings: effective_cache_size = '7932MB', work_mem = '15MB'
 Planning Time: 0.067 ms
 Execution Time: 54.685 ms
(9 rows)


(6)

v14-0002-Add-skip-scan-to-nbtree.patch
src/backend/access/nbtree/nbtutils.c

+static int
+_bt_preprocess_num_array_keys(IndexScanDesc scan, BTSkipPreproc *skipatts,
+                                                         int *numSkipArrayKeys)
(snip)
+               int                     prev_numSkipArrayKeys = 
*numSkipArrayKeys;
+
+               /*
+                * Backfill skip arrays for any wholly omitted attributes prior 
to
+                * attno_inkey
+                */
+               while (attno_skip < attno_inkey)

Is it better to move prev_numSkipArrayKeys =*numSkipArrayKeys after the while loop? For example, the index below should return *numSkipArrayKeys = 0 instead of 1
if the id3 type does not support eq_op.

* index: CREATE INDEX test_idx on TEST (id1 int, id2 int, id3 no_eq_op, id4 int);
* query: SELECT * FROM test WHERE id4 = 10;


Regards,
--
Masahiro Ikeda
NTT DATA CORPORATION
DROP TABLE IF EXISTS t;
CREATE TABLE t (id1 int, id2 int);
-- 5x fast
-- INSERT INTO t (SELECT 1, i%101 FROM generate_series(1,1_000_000) s(i)); 
-- 4x slow
INSERT INTO t (SELECT i, i FROM generate_series(1,1_000_000) s(i));
CREATE INDEX t_idx on t (id1, id2);
VACUUM FREEZE ANALYZE;
SET skipscan_prefix_cols=32;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * FROM t 
WHERE id2 = 100; -- cache
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * FROM t 
WHERE id2 = 100;
SET skipscan_prefix_cols=0;
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT, SETTINGS, WAL, VERBOSE) SELECT * FROM t 
WHERE id2 = 100;
DROP INDEX t_idx;

Reply via email to