Attached patch has EXPLAIN ANALYZE display the total number of primitive index scans for all 3 kinds of index scan node. This is useful for index scans that happen to use SAOP arrays. It also seems almost essential to offer this kind of instrumentation for the skip scan patch [1]. Skip scan works by reusing all of the Postgres 17 work (see commit 5bf748b8) to skip over irrelevant sections of a composite index with a low cardinality leading column, so it has all the same issues.
One reason to have this patch is to differentiate between similar cases involving simple SAOP arrays. The user will have some reasonable way of determining how a query such as this: pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off, summary off) select abalance from pgbench_accounts where aid in (1, 2, 3, 4, 5); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual time=0.007..0.008 rows=5 loops=1) │ │ Index Cond: (aid = ANY ('{1,2,3,4,5}'::integer[])) │ │ Primitive Index Scans: 1 │ │ Buffers: shared hit=4 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────┘ (4 rows) ...differs from a similar query, such as this: pg@regression:5432 [2070325]=# explain (analyze, buffers, costs off, summary off) select abalance from pgbench_accounts where aid in (1000, 2000, 3000, 4000, 5000); ┌──────────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├──────────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual time=0.006..0.012 rows=5 loops=1) │ │ Index Cond: (aid = ANY ('{1000,2000,3000,4000,5000}'::integer[])) │ │ Primitive Index Scans: 5 │ │ Buffers: shared hit=20 │ └──────────────────────────────────────────────────────────────────────────────────────────────────────┘ (4 rows) Another reason to have this patch is consistency. We're only showing the user the number of times we've incremented pg_stat_user_tables.idx_scan in each case. The fact that pg_stat_user_tables.idx_scan counts primitive index scans like this is nothing new. That issue was only documented quite recently, as part of the Postgres 17 work, and it seems quite misleading. It's consistent, but not necessarily nonsurprising. Making it readily apparent that there is more than one primitive index scan involved here makes the issue less surprising. Skip scan --------- Here's an example with this EXPLAIN ANALYZE patch applied on top of my skip scan patch [1], using the tenk1 table left behind when the standard regression tests are run: pg@regression:5432 [2070865]=# create index on tenk1 (four, stringu1); CREATE INDEX pg@regression:5432 [2070865]=# explain (analyze, buffers, costs off, summary off) select stringu1 from tenk1 where -- Omitted: the leading column on "four" stringu1 = 'YGAAAA'; ┌───────────────────────────────────────────────────────────────────────────────────────────────────┐ │ QUERY PLAN │ ├───────────────────────────────────────────────────────────────────────────────────────────────────┤ │ Index Only Scan using tenk1_four_stringu1_idx on tenk1 (actual time=0.011..0.017 rows=15 loops=1) │ │ Index Cond: (stringu1 = 'YGAAAA'::name) │ │ Heap Fetches: 0 │ │ Primitive Index Scans: 5 │ │ Buffers: shared hit=11 │ └───────────────────────────────────────────────────────────────────────────────────────────────────┘ (5 rows) Notice that there are 5 primitive index scans here. That's what I'd expect, given that there are exactly 4 distinct "logical subindexes" implied by our use of a leading column on "four" as the scan's skip column. Under the hood, an initial primitive index scan locates the lowest "four" value. There are then 4 additional primitive index scans to locate the next "four" value (needed when the current "four" value gets past the value's "stringu1 = 'YGAAAA'" tuples). Obviously, the cardinality of the leading column predicts the number of primitive index scans at runtime. But it can be much more complicated of a relationship than what I've shown here may suggest. Skewness matters, too. Small clusters of index tuples with unique leading column values will greatly increase column cardinality/ndistinct, without a commensurate increase in the cost of a skip scan (that skips using that column). Those small clusters of unique values will appear on the same few leaf pages. It follows that they cannot substantially increase the number of primitive scans required at runtime -- they'll just be read all together at once. An important goal of my design for skip scan is that we avoid the need for special index paths within the optimizer. Whether or not we skip is always a runtime decision (when a skippable index attribute exists at all). The optimizer needs to know about skipping for costing purposes only -- all of the required optimizer changes are in selfuncs.c. That's why you didn't see some kind of special new index scan node here -- you just saw the number of primitive index scans. My motivation for working on this EXPLAIN ANALYZE patch is primarily skip scan. I don't think that it necessarily matters, though. I think that this patch can be treated as independent work. It would have been weird to not bring it up skip scan even once here, though. Presentation design choices --------------------------- I've used the term "primitive index scan" for this. That is the existing user-visible terminology [2], though I suppose that that could be revisited now. Another quasi-arbitrary design choice: I don't break out primitive index scans for scan nodes with multiple loops (e.g., the inner side of a nested loop join). The count of primitive scans accumulates across index_rescan calls. I did things this way because it felt slightly more logical to follow what we show for "Buffers" -- primitive index scans are another physical cost. I'm certainly not opposed to doing that part differently. It doesn't have to be one or the other (could break it out both ways), if people think that the added verbosity is worth it. I think that we shouldn't be counting calls to _bt_first as a primitive index scan unless they either call _bt_search or _bt_endpoint to descend the index (in the case of nbtree scans). This means that cases where we detect a contradictory qual in _bt_preprocess_keys should count as having zero primitive index scans. That is technically an independent thing, though it seems far more logical to just do it that way. Actually, I think that there might be existing bugs on HEAD, with parallel index scan -- I think we might be overcounting. We're not properly accounting for the fact that parallel workers usually don't perform a primitive index scan when their backend calls into _bt_first. I wonder if I should address that separately, as a bug fix... [1] https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZdn1STSG_y8qP__vggTaPAYXJP%2BG4bw%40mail.gmail.com [2] https://www.postgresql.org/docs/devel/monitoring-stats.html#MONITORING-PG-STAT-ALL-INDEXES-VIEW -- see "Note" box -- Peter Geoghegan
v1-0001-Show-primitive-scan-count-in-EXPLAIN-ANALYZE.patch
Description: Binary data