Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

Peter Geoghegan Thu, 15 Aug 2024 12:23:19 -0700

Attached patch has EXPLAIN ANALYZE display the total number of
primitive index scans for all 3 kinds of index scan node. This is
useful for index scans that happen to use SAOP arrays. It also seems
almost essential to offer this kind of instrumentation for the skip
scan patch [1]. Skip scan works by reusing all of the Postgres 17 work
(see commit 5bf748b8) to skip over irrelevant sections of a composite
index with a low cardinality leading column, so it has all the same
issues.


One reason to have this patch is to differentiate between similar
cases involving simple SAOP arrays. The user will have some reasonable
way of determining how a query such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
  abalance
from
  pgbench_accounts
where
  aid in (1, 2, 3, 4, 5);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              QUERY PLAN
                                │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.007..0.008 rows=5 loops=1) │
│   Index Cond: (aid = ANY ('{1,2,3,4,5}'::integer[]))
                                │
│   Primitive Index Scans: 1
                                │
│   Buffers: shared hit=4
                                │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

...differs from a similar query, such as this:

pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off,
summary off)
select
  abalance
from
  pgbench_accounts
where
  aid in (1000, 2000, 3000, 4000, 5000);
┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                              QUERY PLAN
                                │
├──────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual
time=0.006..0.012 rows=5 loops=1) │
│   Index Cond: (aid = ANY ('{1000,2000,3000,4000,5000}'::integer[]))
                                │
│   Primitive Index Scans: 5
                                │
│   Buffers: shared hit=20
                                │
└──────────────────────────────────────────────────────────────────────────────────────────────────────┘
(4 rows)

Another reason to have this patch is consistency. We're only showing
the user the number of times we've incremented
pg_stat_user_tables.idx_scan in each case. The fact that
pg_stat_user_tables.idx_scan counts primitive index scans like this is
nothing new. That issue was only documented quite recently, as part of
the Postgres 17 work, and it seems quite misleading. It's consistent,
but not necessarily nonsurprising. Making it readily apparent that
there is more than one primitive index scan involved here makes the
issue less surprising.

Skip scan
---------

Here's an example with this EXPLAIN ANALYZE patch applied on top of my
skip scan patch [1], using the tenk1 table left behind when the
standard regression tests are run:

pg@regression:5432 [2070865]=# create index on tenk1 (four, stringu1);
CREATE INDEX
pg@regression:5432 [2070865]=# explain (analyze, buffers, costs off,
summary off)
select
  stringu1
from
  tenk1
where
  -- Omitted: the leading column on "four"
  stringu1 = 'YGAAAA';
┌───────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                            QUERY PLAN
                             │
├───────────────────────────────────────────────────────────────────────────────────────────────────┤
│ Index Only Scan using tenk1_four_stringu1_idx on tenk1 (actual
time=0.011..0.017 rows=15 loops=1) │
│   Index Cond: (stringu1 = 'YGAAAA'::name)
                             │
│   Heap Fetches: 0
                             │
│   Primitive Index Scans: 5
                             │
│   Buffers: shared hit=11
                             │
└───────────────────────────────────────────────────────────────────────────────────────────────────┘
(5 rows)

Notice that there are 5 primitive index scans here. That's what I'd
expect, given that there are exactly 4 distinct "logical subindexes"
implied by our use of a leading column on "four" as the scan's skip
column. Under the hood, an initial primitive index scan locates the
lowest "four" value. There are then 4 additional primitive index scans
to locate the next "four" value (needed when the current "four" value
gets past the value's "stringu1 = 'YGAAAA'" tuples).

Obviously, the cardinality of the leading column predicts the number
of primitive index scans at runtime. But it can be much more
complicated of a relationship than what I've shown here may suggest.
Skewness matters, too. Small clusters of index tuples with unique
leading column values will greatly increase column
cardinality/ndistinct, without a commensurate increase in the cost of
a skip scan (that skips using that column). Those small clusters of
unique values will appear on the same few leaf pages. It follows that
they cannot substantially increase the number of primitive scans
required at runtime -- they'll just be read all together at once.

An important goal of my design for skip scan is that we avoid the need
for special index paths within the optimizer. Whether or not we skip
is always a runtime decision (when a skippable index attribute exists
at all). The optimizer needs to know about skipping for costing
purposes only -- all of the required optimizer changes are in
selfuncs.c. That's why you didn't see some kind of special new index
scan node here -- you just saw the number of primitive index scans.

My motivation for working on this EXPLAIN ANALYZE patch is primarily
skip scan. I don't think that it necessarily matters, though. I think
that this patch can be treated as independent work. It would have been
weird to not bring it up skip scan even once here, though.

Presentation design choices
---------------------------

I've used the term "primitive index scan" for this. That is the
existing user-visible terminology [2], though I suppose that that
could be revisited now.

Another quasi-arbitrary design choice: I don't break out primitive
index scans for scan nodes with multiple loops (e.g., the inner side
of a nested loop join). The count of primitive scans accumulates
across index_rescan calls. I did things this way because it felt
slightly more logical to follow what we show for "Buffers" --
primitive index scans are another physical cost. I'm certainly not
opposed to doing that part differently. It doesn't have to be one or
the other (could break it out both ways), if people think that the
added verbosity is worth it.

I think that we shouldn't be counting calls to _bt_first as a
primitive index scan unless they either call _bt_search or
_bt_endpoint to descend the index (in the case of nbtree scans). This
means that cases where we detect a contradictory qual in
_bt_preprocess_keys should count as having zero primitive index scans.
That is technically an independent thing, though it seems far more
logical to just do it that way.

Actually, I think that there might be existing bugs on HEAD, with
parallel index scan -- I think we might be overcounting. We're not
properly accounting for the fact that parallel workers usually don't
perform a primitive index scan when their backend calls into
_bt_first. I wonder if I should address that separately, as a bug
fix...

[1] 
https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP%2BG4bw%40mail.gmail.com
[2] 
https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW
-- see "Note" box
--
Peter Geoghegan

v1-0001-Show-primitive-scan-count-in-EXPLAIN-ANALYZE.patch
Description: Binary data

Showing primitive index scan count in EXPLAIN ANALYZE (for skip scan and SAOP scans)

Reply via email to