On 7/23/25 17:09, Andres Freund wrote: > Hi, > > On 2025-07-23 14:50:15 +0200, Tomas Vondra wrote: >> On 7/23/25 02:59, Andres Freund wrote: >>> Hi, >>> >>> On 2025-07-23 02:50:04 +0200, Tomas Vondra wrote: >>>> But I don't see why would this have any effect on the prefetch distance, >>>> queue depth etc. Or why decreasing INDEX_SCAN_MAX_BATCHES should improve >>>> that. I'd have expected exactly the opposite behavior. >>>> >>>> Could be bug, of course. But it'd be helpful to see the dataset/query. >>> >>> Pgbench scale 500, with the simpler query from my message. >>> >> >> I tried to reproduce this, but I'm not seeing behavior. I'm not sure how >> you monitor the queue depth (presumably iostat?) > > Yes, iostat, since I was looking at what the "actually required" lookahead > distance is. > > Do you actually get the query to be entirely CPU bound? What amount of IO > waiting do you see EXPLAIN (ANALYZE, TIMING OFF) with track_io_timing=on > report? >
No, it definitely needs to wait for I/O (FWIW it's on the xeon, with a single NVMe SSD). > Ah - I was using a very high effective_io_concurrency. With a high > effective_io_concurrency value I see a lot of stalls, even at > INDEX_SCAN_MAX_BATCHES = 64. And a lower prefetch distance, which seems > somewhat odd. > I think that's a bug in the explain patch. The counters were updated at the beginning of _next_buffer(), but that's wrong - a single call to _next_buffer() can prefetch multiple blocks. This skewed the stats, as the prefetches are not counted with "distance=0". With higher eic this happens sooner, so the average distance seemed to decrease. The attached patch does the updates in _get_block(), which I think is better. And "stall" now means (distance == 1), which I think detects requests without prefetching. I also added a separate "Count" for the actual number of prefetched blocks, and "Skipped" for duplicate blocks skipped (which the read stream never even sees, because it's skipped in the callback). > > FWIW, in my tests I was just evicting lineitem from shared buffers, since I > wanted to test the heap prefetching, without stalls induced by blocking on > index reads. But what I described happens with either. > > ;SET effective_io_concurrency = 256;SELECT > pg_buffercache_evict_relation('pgbench_accounts'); explain (analyze, costs > off, timing off) SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts > ORDER BY aid LIMIT 10000000); > ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN > │ > ├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ Aggregate (actual rows=1.00 loops=1) > │ > │ Buffers: shared hit=27369 read=164191 > │ > │ I/O Timings: shared read=358.795 > │ > │ -> Limit (actual rows=10000000.00 loops=1) > │ > │ Buffers: shared hit=27369 read=164191 > │ > │ I/O Timings: shared read=358.795 > │ > │ -> Index Scan using pgbench_accounts_pkey on pgbench_accounts > (actual rows=10000000.00 loops=1) │ > │ Index Searches: 1 > │ > │ Prefetch Distance: 256.989 > │ > │ Prefetch Stalls: 3 > │ > │ Prefetch Resets: 3 > │ > │ Buffers: shared hit=27369 read=164191 > │ > │ I/O Timings: shared read=358.795 > │ > │ Planning Time: 0.086 ms > │ > │ Execution Time: 4194.845 ms > │ > └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > > ;SET effective_io_concurrency = 512;SELECT > pg_buffercache_evict_relation('pgbench_accounts'); explain (analyze, costs > off, timing off) SELECT max(abalance) FROM (SELECT * FROM pgbench_accounts > ORDER BY aid LIMIT 10000000); > ┌──────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN > │ > ├──────────────────────────────────────────────────────────────────────────────────────────────────────────┤ > │ Aggregate (actual rows=1.00 loops=1) > │ > │ Buffers: shared hit=27368 read=164190 > │ > │ I/O Timings: shared read=832.515 > │ > │ -> Limit (actual rows=10000000.00 loops=1) > │ > │ Buffers: shared hit=27368 read=164190 > │ > │ I/O Timings: shared read=832.515 > │ > │ -> Index Scan using pgbench_accounts_pkey on pgbench_accounts > (actual rows=10000000.00 loops=1) │ > │ Index Searches: 1 > │ > │ Prefetch Distance: 56.778 > │ > │ Prefetch Stalls: 160569 > │ > │ Prefetch Resets: 423 > │ > │ Buffers: shared hit=27368 read=164190 > │ > │ I/O Timings: shared read=832.515 > │ > │ Planning Time: 0.084 ms > │ > │ Execution Time: 4413.058 ms > │ > └──────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > > Greetings, > The attached v2 explain patch should fix that. I'm also attaching logs from my explain, for 64 and 16 batches. I think the output makes much more sense now. cheers -- Tomas Vondra
====================== eic 1 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191281 I/O Timings: shared read=3341.988 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191281 I/O Timings: shared read=3341.988 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 31.996 Prefetch Count: 163951 Prefetch Stalls: 1 Prefetch Skips: 9837060 Prefetch Resets: 3 Buffers: shared read=191281 I/O Timings: shared read=3341.988 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.595 Planning Time: 4.711 ms Execution Time: 5948.513 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 6046.00 241029.00 0.00 0.00 0.09 39.87 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.57 59.20 nvme0n1p1 6368.00 258944.00 0.00 0.00 0.09 40.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.56 60.00 nvme0n1p1 6323.00 257136.00 0.00 0.00 0.09 40.67 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.57 58.40 nvme0n1p1 6258.00 254328.00 0.00 0.00 0.09 40.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.57 54.80 nvme0n1p1 6414.00 260896.00 0.00 0.00 0.09 40.68 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.56 63.60 ====================== eic 8 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191411 I/O Timings: shared read=763.833 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191411 I/O Timings: shared read=763.833 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 143.923 Prefetch Count: 164063 Prefetch Stalls: 1 Prefetch Skips: 9843780 Prefetch Resets: 3 Buffers: shared read=191411 I/O Timings: shared read=763.833 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.601 Planning Time: 7.373 ms Execution Time: 3045.476 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 11406.00 458925.00 0.00 0.00 0.09 40.24 14.00 97.50 0.00 0.00 0.07 6.96 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.08 86.00 nvme0n1p1 12564.00 510848.00 0.00 0.00 0.09 40.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.15 90.40 nvme0n1p1 12666.00 515376.00 0.00 0.00 0.09 40.69 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.09 90.80 ====================== eic 16 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191561 I/O Timings: shared read=780.076 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191561 I/O Timings: shared read=780.076 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 271.761 Prefetch Count: 164191 Prefetch Stalls: 1 Prefetch Skips: 9851460 Prefetch Resets: 3 Buffers: shared read=191561 I/O Timings: shared read=780.076 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.581 Planning Time: 7.476 ms Execution Time: 3025.995 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 11731.00 472197.00 0.00 0.00 0.09 40.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.09 90.40 nvme0n1p1 12446.00 505944.00 0.00 0.00 0.09 40.65 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.14 90.40 nvme0n1p1 12751.00 518936.00 0.00 0.00 0.09 40.70 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.10 95.60 ====================== eic 32 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1283.840 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1283.840 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 253.411 Prefetch Count: 164010 Prefetch Stalls: 355 Prefetch Skips: 9840600 Prefetch Resets: 357 Buffers: shared read=191348 I/O Timings: shared read=1283.840 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.693 Planning Time: 7.426 ms Execution Time: 3618.736 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 10195.00 398333.00 0.00 0.00 0.11 39.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.12 52.40 nvme0n1p1 10506.00 414456.00 0.00 0.00 0.11 39.45 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.16 54.80 nvme0n1p1 10899.00 430776.00 0.00 0.00 0.11 39.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.17 55.20 ====================== eic 64 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1427.778 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1427.778 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 253.411 Prefetch Count: 164010 Prefetch Stalls: 355 Prefetch Skips: 9840600 Prefetch Resets: 357 Buffers: shared read=191348 I/O Timings: shared read=1427.778 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=3.138 Planning Time: 7.869 ms Execution Time: 3829.833 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 9217.00 359565.00 0.00 0.00 0.11 39.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.01 48.00 nvme0n1p1 10042.00 396664.00 0.00 0.00 0.11 39.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.07 50.40 nvme0n1p1 10162.00 401184.00 0.00 0.00 0.11 39.48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.11 53.60 ====================== eic 128 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1374.205 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1374.205 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 253.411 Prefetch Count: 164010 Prefetch Stalls: 355 Prefetch Skips: 9840600 Prefetch Resets: 357 Buffers: shared read=191348 I/O Timings: shared read=1374.205 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.979 Planning Time: 7.543 ms Execution Time: 3808.224 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 9641.00 376724.00 0.00 0.00 0.11 39.08 1.00 9.50 0.00 0.00 0.00 9.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.06 51.20 nvme0n1p1 10193.00 401992.00 0.00 0.00 0.11 39.44 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.10 50.80 nvme0n1p1 10332.00 408664.00 0.00 0.00 0.11 39.55 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.10 51.20 ====================== eic 256 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1330.131 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1330.131 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 253.411 Prefetch Count: 164010 Prefetch Stalls: 355 Prefetch Skips: 9840600 Prefetch Resets: 357 Buffers: shared read=191348 I/O Timings: shared read=1330.131 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.566 Planning Time: 6.547 ms Execution Time: 3674.986 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 10039.00 391733.00 0.00 0.00 0.11 39.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.10 51.60 nvme0n1p1 10422.00 411928.00 0.00 0.00 0.11 39.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.15 53.60 nvme0n1p1 10843.00 427872.00 0.00 0.00 0.11 39.46 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.16 51.20 ====================== eic 512 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1367.977 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191348 I/O Timings: shared read=1367.977 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 253.411 Prefetch Count: 164010 Prefetch Stalls: 355 Prefetch Skips: 9840600 Prefetch Resets: 357 Buffers: shared read=191348 I/O Timings: shared read=1367.977 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=3.124 Planning Time: 8.065 ms Execution Time: 3828.742 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 9439.00 368301.00 0.00 0.00 0.11 39.02 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.04 51.20 nvme0n1p1 10048.00 396720.00 0.00 0.00 0.11 39.48 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.09 51.60 nvme0n1p1 10251.00 405160.00 0.00 0.00 0.11 39.52 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 1.09 51.20
====================== eic 1 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191281 I/O Timings: shared read=3013.116 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191281 I/O Timings: shared read=3013.116 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 31.996 Prefetch Count: 163951 Prefetch Stalls: 1 Prefetch Skips: 9837060 Prefetch Resets: 3 Buffers: shared read=191281 I/O Timings: shared read=3013.116 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.550 Planning Time: 6.822 ms Execution Time: 5279.929 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 6710.00 268196.50 0.00 0.00 0.10 39.97 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.64 60.00 nvme0n1p1 7122.00 289432.00 0.00 0.00 0.09 40.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.63 62.00 nvme0n1p1 6811.00 276864.00 0.00 0.00 0.10 40.65 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66 63.60 nvme0n1p1 7209.00 293144.00 0.00 0.00 0.09 40.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.65 72.80 nvme0n1p1 7519.00 306128.00 0.00 0.00 0.08 40.71 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.64 65.60 ====================== eic 8 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2625.972 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2625.972 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 58.116 Prefetch Count: 163944 Prefetch Stalls: 1188 Prefetch Skips: 9836640 Prefetch Resets: 1190 Buffers: shared read=191271 I/O Timings: shared read=2625.972 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.656 Planning Time: 6.605 ms Execution Time: 5102.460 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 7681.00 278516.00 0.00 0.00 0.11 36.26 19.00 143.00 0.00 0.00 0.11 7.53 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 52.40 nvme0n1p1 8316.00 305400.00 0.00 0.00 0.11 36.72 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.89 62.00 nvme0n1p1 7695.00 282568.00 0.00 0.00 0.12 36.72 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.90 64.80 nvme0n1p1 8350.00 307192.00 0.00 0.00 0.11 36.79 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.89 55.60 nvme0n1p1 8399.00 308768.00 0.00 0.00 0.11 36.76 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 60.80 ====================== eic 16 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2705.352 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2705.352 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 58.116 Prefetch Count: 163944 Prefetch Stalls: 1188 Prefetch Skips: 9836640 Prefetch Resets: 1190 Buffers: shared read=191271 I/O Timings: shared read=2705.352 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=3.004 Planning Time: 7.725 ms Execution Time: 5308.190 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 7107.00 256981.00 0.00 0.00 0.11 36.16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.78 51.60 nvme0n1p1 7879.00 289552.00 0.00 0.00 0.10 36.75 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.81 49.20 nvme0n1p1 7671.00 281904.00 0.00 0.00 0.11 36.75 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.82 55.20 nvme0n1p1 8136.00 298944.00 0.00 0.00 0.10 36.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.84 59.20 nvme0n1p1 7933.00 291688.00 0.00 0.00 0.11 36.77 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.85 56.80 ====================== eic 32 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2931.130 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2931.130 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 58.116 Prefetch Count: 163944 Prefetch Stalls: 1188 Prefetch Skips: 9836640 Prefetch Resets: 1190 Buffers: shared read=191271 I/O Timings: shared read=2931.130 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.921 Planning Time: 6.654 ms Execution Time: 5572.837 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 6674.00 241316.00 0.00 0.00 0.12 36.16 1.00 9.50 0.00 0.00 0.00 9.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.79 61.20 nvme0n1p1 7342.00 269680.00 0.00 0.00 0.11 36.73 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.84 57.60 nvme0n1p1 7713.86 283516.83 0.00 0.00 0.11 36.75 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.85 59.41 nvme0n1p1 6987.00 256696.00 0.00 0.00 0.11 36.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.80 58.40 nvme0n1p1 7835.00 288344.00 0.00 0.00 0.10 36.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.80 54.00 ====================== eic 64 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2661.311 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2661.311 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 58.116 Prefetch Count: 163944 Prefetch Stalls: 1188 Prefetch Skips: 9836640 Prefetch Resets: 1190 Buffers: shared read=191271 I/O Timings: shared read=2661.311 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.675 Planning Time: 6.858 ms Execution Time: 5239.994 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 6942.00 251236.00 0.00 0.00 0.12 36.19 1.00 9.50 0.00 0.00 0.00 9.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.85 62.00 nvme0n1p1 8176.00 300400.00 0.00 0.00 0.11 36.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.87 73.60 nvme0n1p1 7938.00 291496.00 0.00 0.00 0.11 36.72 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 72.80 nvme0n1p1 8235.00 302680.00 0.00 0.00 0.10 36.76 1.00 8.00 0.00 0.00 0.00 8.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 60.80 nvme0n1p1 8155.00 300080.00 0.00 0.00 0.10 36.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.85 59.60 ====================== eic 128 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2607.508 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2607.508 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 58.116 Prefetch Count: 163944 Prefetch Stalls: 1188 Prefetch Skips: 9836640 Prefetch Resets: 1190 Buffers: shared read=191271 I/O Timings: shared read=2607.508 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.742 Planning Time: 7.629 ms Execution Time: 5110.434 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 7380.00 267116.50 0.00 0.00 0.12 36.19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 70.40 nvme0n1p1 8236.00 302600.00 0.00 0.00 0.10 36.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 66.00 nvme0n1p1 7988.12 293394.06 0.00 0.00 0.11 36.73 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.87 61.78 nvme0n1p1 8310.00 305424.00 0.00 0.00 0.10 36.75 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.87 65.60 nvme0n1p1 8542.00 314264.00 0.00 0.00 0.10 36.79 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.87 60.00 ====================== eic 256 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2540.047 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2540.047 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 58.116 Prefetch Count: 163944 Prefetch Stalls: 1188 Prefetch Skips: 9836640 Prefetch Resets: 1190 Buffers: shared read=191271 I/O Timings: shared read=2540.047 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.644 Planning Time: 6.982 ms Execution Time: 5036.594 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 7847.00 284436.00 0.00 0.00 0.11 36.25 1.00 9.50 0.00 0.00 0.00 9.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 56.80 nvme0n1p1 7730.00 284024.00 0.00 0.00 0.11 36.74 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 62.40 nvme0n1p1 8462.00 310752.00 0.00 0.00 0.10 36.72 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 62.80 nvme0n1p1 8352.00 307328.00 0.00 0.00 0.10 36.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 74.40 nvme0n1p1 8719.00 320712.00 0.00 0.00 0.10 36.78 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 55.60 ====================== eic 512 ======================== SET QUERY PLAN ---------------------------------------------------------------------------------------------------------- Aggregate (actual rows=1.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2510.486 -> Limit (actual rows=10000000.00 loops=1) Buffers: shared read=191271 I/O Timings: shared read=2510.486 -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (actual rows=10000000.00 loops=1) Index Searches: 1 Prefetch Distance: 58.116 Prefetch Count: 163944 Prefetch Stalls: 1188 Prefetch Skips: 9836640 Prefetch Resets: 1190 Buffers: shared read=191271 I/O Timings: shared read=2510.486 Planning: Buffers: shared hit=46 read=22 I/O Timings: shared read=2.700 Planning Time: 7.197 ms Execution Time: 5037.071 ms (20 rows) Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util nvme0n1p1 7379.00 267108.50 0.00 0.00 0.12 36.20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.86 60.80 nvme0n1p1 8128.00 298680.00 0.00 0.00 0.11 36.75 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.87 62.00 nvme0n1p1 8437.00 309752.00 0.00 0.00 0.11 36.71 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.89 53.60 nvme0n1p1 8609.00 316528.00 0.00 0.00 0.10 36.77 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.88 60.00 nvme0n1p1 8534.00 314016.00 0.00 0.00 0.10 36.80 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.87 68.80
diff --git a/src/backend/access/index/indexam.c b/src/backend/access/index/indexam.c index 4835c48b448..b0e50307a0e 100644 --- a/src/backend/access/index/indexam.c +++ b/src/backend/access/index/indexam.c @@ -407,12 +407,6 @@ index_beginscan_internal(Relation indexRelation, scan->parallel_scan = pscan; scan->xs_temp_snap = temp_snap; - /* - * No batching by default, so set it to NULL. Will be initialized later if - * batching is requested and AM supports it. - */ - scan->xs_batches = NULL; - return scan; } @@ -463,6 +457,17 @@ index_rescan(IndexScanDesc scan, orderbys, norderbys); } +void +index_get_prefetch_stats(IndexScanDesc scan, int *accum, int *count, int *stalls, int *resets, int *skips) +{ + /* ugly */ + if (scan->xs_heapfetch->rs != NULL) + { + read_stream_prefetch_stats(scan->xs_heapfetch->rs, + accum, count, stalls, resets, skips); + } +} + /* ---------------- * index_endscan - end a scan * ---------------- @@ -1883,6 +1888,7 @@ index_scan_stream_read_next(ReadStream *stream, /* same block as before, don't need to read it */ if (scan->xs_batches->lastBlock == ItemPointerGetBlockNumber(tid)) { + read_stream_skip_block(stream); DEBUG_LOG("index_scan_stream_read_next: skip block (lastBlock)"); continue; } diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index 7e2792ead71..9c95b4e2878 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -136,6 +136,7 @@ static void show_memoize_info(MemoizeState *mstate, List *ancestors, ExplainState *es); static void show_hashagg_info(AggState *aggstate, ExplainState *es); static void show_indexsearches_info(PlanState *planstate, ExplainState *es); +static void show_indexprefetch_info(PlanState *planstate, ExplainState *es); static void show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es); static void show_instrumentation_count(const char *qlabel, int which, @@ -1966,6 +1967,7 @@ ExplainNode(PlanState *planstate, List *ancestors, show_instrumentation_count("Rows Removed by Filter", 1, planstate, es); show_indexsearches_info(planstate, es); + show_indexprefetch_info(planstate, es); break; case T_IndexOnlyScan: show_scan_qual(((IndexOnlyScan *) plan)->indexqual, @@ -1983,6 +1985,7 @@ ExplainNode(PlanState *planstate, List *ancestors, ExplainPropertyFloat("Heap Fetches", NULL, planstate->instrument->ntuples2, 0, es); show_indexsearches_info(planstate, es); + show_indexprefetch_info(planstate, es); break; case T_BitmapIndexScan: show_scan_qual(((BitmapIndexScan *) plan)->indexqualorig, @@ -3889,6 +3892,50 @@ show_indexsearches_info(PlanState *planstate, ExplainState *es) ExplainPropertyUInteger("Index Searches", NULL, nsearches, es); } +static void +show_indexprefetch_info(PlanState *planstate, ExplainState *es) +{ + Plan *plan = planstate->plan; + + int count = 0, + accum = 0, + stalls = 0, + resets = 0, + skips = 0; + + if (!es->analyze) + return; + + /* Initialize counters with stats from the local process first */ + switch (nodeTag(plan)) + { + case T_IndexScan: + { + IndexScanState *indexstate = ((IndexScanState *) planstate); + + count = indexstate->iss_PrefetchCount; + accum = indexstate->iss_PrefetchAccum; + stalls = indexstate->iss_PrefetchStalls; + resets = indexstate->iss_ResetCount; + skips = indexstate->iss_SkipCount; + + break; + } + default: + break; + } + + if (count > 0) + { + ExplainPropertyFloat("Prefetch Distance", NULL, (accum * 1.0 / count), 3, es); + ExplainPropertyUInteger("Prefetch Count", NULL, count, es); + ExplainPropertyUInteger("Prefetch Stalls", NULL, stalls, es); + ExplainPropertyUInteger("Prefetch Skips", NULL, skips, es); + ExplainPropertyUInteger("Prefetch Resets", NULL, resets, es); + } +} + + /* * Show exact/lossy pages for a BitmapHeapScan node */ diff --git a/src/backend/executor/nodeIndexscan.c b/src/backend/executor/nodeIndexscan.c index 7fcaa37fe62..707badc4fdc 100644 --- a/src/backend/executor/nodeIndexscan.c +++ b/src/backend/executor/nodeIndexscan.c @@ -125,6 +125,13 @@ IndexNext(IndexScanState *node) node->iss_OrderByKeys, node->iss_NumOrderByKeys); } + index_get_prefetch_stats(scandesc, + &node->iss_PrefetchAccum, + &node->iss_PrefetchCount, + &node->iss_PrefetchStalls, + &node->iss_ResetCount, + &node->iss_SkipCount); + /* * ok, now that we have what we need, fetch the next tuple. */ @@ -1088,6 +1095,12 @@ ExecInitIndexScan(IndexScan *node, EState *estate, int eflags) indexstate->iss_RuntimeContext = NULL; } + indexstate->iss_PrefetchAccum = 0; + indexstate->iss_PrefetchCount = 0; + indexstate->iss_PrefetchStalls = 0; + indexstate->iss_ResetCount = 0; + indexstate->iss_SkipCount = 0; + /* * all done. */ diff --git a/src/backend/storage/aio/read_stream.c b/src/backend/storage/aio/read_stream.c index 0e7f5557f5c..e41189f6612 100644 --- a/src/backend/storage/aio/read_stream.c +++ b/src/backend/storage/aio/read_stream.c @@ -106,6 +106,12 @@ struct ReadStream bool advice_enabled; bool temporary; + int distance_accum; + int distance_count; + int distance_stalls; + int reset_count; + int skip_count; + /* * One-block buffer to support 'ungetting' a block number, to resolve flow * control problems when I/Os are split. @@ -180,6 +186,16 @@ read_stream_get_block(ReadStream *stream, void *per_buffer_data) { BlockNumber blocknum; + if (stream->distance > 1) + { + stream->distance_accum += stream->distance; + stream->distance_count += 1; + } + else + { + stream->distance_stalls += 1; + } + blocknum = stream->buffered_blocknum; if (blocknum != InvalidBlockNumber) stream->buffered_blocknum = InvalidBlockNumber; @@ -681,6 +697,12 @@ read_stream_begin_impl(int flags, stream->seq_until_processed = InvalidBlockNumber; stream->temporary = SmgrIsTemp(smgr); + stream->distance_accum = 0; + stream->distance_count = 0; + stream->distance_stalls = 0; + stream->reset_count = 0; + stream->skip_count = 0; + /* * Skip the initial ramp-up phase if the caller says we're going to be * reading the whole relation. This way we start out assuming we'll be @@ -771,6 +793,17 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data) { Buffer buffer; int16 oldest_buffer_index; +/* + if (stream->distance > 0) + { + stream->distance_accum += stream->distance; + stream->distance_count += 1; + } + else + { + stream->distance_stalls += 1; + } +*/ #ifndef READ_STREAM_DISABLE_FAST_PATH @@ -1046,6 +1079,8 @@ read_stream_reset(ReadStream *stream) /* Start off assuming data is cached. */ stream->distance = 1; + + stream->reset_count += 1; } /* @@ -1057,3 +1092,19 @@ read_stream_end(ReadStream *stream) read_stream_reset(stream); pfree(stream); } + +void +read_stream_prefetch_stats(ReadStream *stream, int *accum, int *count, int *stalls, int *resets, int *skips) +{ + *accum = stream->distance_accum; + *count = stream->distance_count; + *stalls = stream->distance_stalls; + *resets = stream->reset_count; + *skips = stream->skip_count; +} + +void +read_stream_skip_block(ReadStream *stream) +{ + stream->skip_count++; +} diff --git a/src/include/access/genam.h b/src/include/access/genam.h index 3a3a44be3a5..f1e5fdfd478 100644 --- a/src/include/access/genam.h +++ b/src/include/access/genam.h @@ -235,6 +235,7 @@ extern bytea *index_opclass_options(Relation indrel, AttrNumber attnum, Datum attoptions, bool validate); extern IndexScanBatch index_batch_alloc(int maxitems, bool want_itup); +extern void index_get_prefetch_stats(IndexScanDesc scan, int *accum, int *count, int *stalls, int *resets, int *skips); /* * index access method support routines (in genam.c) diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index e107d6e5f81..e91bc7ea35f 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -1722,6 +1722,12 @@ typedef struct IndexScanState IndexScanInstrumentation iss_Instrument; SharedIndexScanInstrumentation *iss_SharedInfo; + int iss_PrefetchAccum; + int iss_PrefetchCount; + int iss_PrefetchStalls; + int iss_ResetCount; + int iss_SkipCount; + /* These are needed for re-checking ORDER BY expr ordering */ pairingheap *iss_ReorderQueue; bool iss_ReachedEnd; diff --git a/src/include/storage/read_stream.h b/src/include/storage/read_stream.h index 9b0d65161d0..34e184a1690 100644 --- a/src/include/storage/read_stream.h +++ b/src/include/storage/read_stream.h @@ -102,4 +102,7 @@ extern ReadStream *read_stream_begin_smgr_relation(int flags, extern void read_stream_reset(ReadStream *stream); extern void read_stream_end(ReadStream *stream); +extern void read_stream_prefetch_stats(ReadStream *stream, int *accum, int *count, int *stalls, int *resets, int *skips); +extern void read_stream_skip_block(ReadStream *stream); + #endif /* READ_STREAM_H */