On Thu Aug 14, 2025 at 3:41 PM EDT, Andres Freund wrote:
> Hm, that is somewhat curious.
>
> I wonder if there's some wait time that's not being captured by "I/O
> Timings". A first thing to do would be to just run strace --summary-only while
> running the query, and see if there are syscall wait times that seem too long.

For the slow, sequential backwards scan query:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.271216           4     66808           io_uring_enter
  0.00    0.000004           4         1           sendto
  0.00    0.000001           0         2         1 recvfrom
  0.00    0.000000           0         5           lseek
  0.00    0.000000           0         1           epoll_wait
  0.00    0.000000           0         4           openat
------ ----------- ----------- --------- --------- ----------------
100.00    0.271221           4     66821         1 total

For the fast, random backwards scan query:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.99    0.351518           4     77819           io_uring_enter
  0.00    0.000007           2         3         1 epoll_wait
  0.00    0.000006           6         1           sendto
  0.00    0.000003           1         3         2 recvfrom
  0.00    0.000002           2         1           read
  0.00    0.000002           2         1         1 rt_sigreturn
  0.00    0.000002           2         1           getpid
  0.00    0.000002           1         2           kill
  0.00    0.000000           0         3           lseek
------ ----------- ----------- --------- --------- ----------------
100.00    0.351542           4     77834         4 total

> What effective_io_concurrency and io_max_concurrency setting are you using? If
> there are no free IO handles that's currently not nicely reported (because
> it's unclear how exactly to do so, see comment above pgaio_io_acquire_nb()).

effective_io_concurrency is 100.  io_max_concurrency is 64.  Nothing out of
the ordinary there.

> iostat -xmy /path/to/block/device
>
> I'd like to see the difference in average IO size (rareq-sz), queue depth
> (aqu-sz) and completion time (r_await) between the fast and slow cases.

I'll show one second interval output.

Slow, sequential backwards scan query
-------------------------------------

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1       24613.00    192.29     0.00   0.00    0.20     8.00    0.00      
0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00 
    0.00    0.00    0.00    4.92  53.20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.22    0.00    0.44    0.85    0.00   98.50

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1       25320.00    197.81     0.00   0.00    0.20     8.00    0.00      
0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00 
    0.00    0.00    0.00    5.18  51.20

Fast, random backwards scan query
---------------------------------

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1       27140.59    212.04     0.00   0.00    0.20     8.00    0.00      
0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00 
    0.00    0.00    0.00    5.50  23.37

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.50    0.00    0.84    0.00    0.00   98.66

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     
wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm 
d_await dareq-sz     f/s f_await  aqu-sz  %util
nvme0n1       50401.00    393.76     0.00   0.00    0.20     8.00    0.00      
0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00 
    0.00    0.00    0.00   10.06  41.60

-- 
Peter Geoghegan


Reply via email to