On 3/17/24 20:36, Tomas Vondra wrote: > > ... > >> Besides a lot of other things, I finally added debugging fprintfs printing >> the >> pid, (prefetch, read), block number. Even looking at tiny excerpts of the >> large amount of output that generates shows that two iterators were out of >> sync. >> > > Thanks. I did experiment with fprintf, but it's quite cumbersome, so I > was hoping you came up with some smart way to trace this king of stuff. > For example I was wondering if ebpf would be a more convenient way. >
FWIW I just realized why I failed to identify this "late prefetch" issue during my investigation. I was experimenting with instrumenting this by adding a LD_PRELOAD library, logging all pread/fadvise calls. But the FilePrefetch call is skipped in the page is already in shared buffers, so this case "disappeared" during processing which matched the two calls by doing an "inner join". That being said, I think tracing this using LD_PRELOAD or perf may be more convenient way to see what's happening. For example I ended up doing this: perf record -a -e syscalls:sys_enter_fadvise64 \ -e syscalls:sys_exit_fadvise64 \ -e syscalls:sys_enter_pread64 \ -e syscalls:sys_exit_pread64 perf script -ns Alternatively, perf-trace can be used and prints the filename too (but time has ms resolution only). Processing this seems comparable to the fprintf approach. It still has the issue that some of the fadvise calls may be absent if the prefetch iterator gets too far behind, but I think that can be detected / measured by simply counting the fadvise calls, and comparing them to pread calls. We expect these to be about the same, so (#pread - #fadvise) / #fadvise is a measure of how many were "late" and skipped. It also seems better than fprintf because it traces the actual syscalls, not just calls to glibc wrappers. For example I saw this postgres 54769 [001] 33768.771524828: syscalls:sys_enter_pread64: ..., pos: 0x30d04000 postgres 54769 [001] 33768.771526867: syscalls:sys_exit_pread64: 0x2000 postgres 54820 [000] 33768.771527473: syscalls:sys_enter_fadvise64: ..., offset: 0x30d04000, ... postgres 54820 [000] 33768.771528320: syscalls:sys_exit_fadvise64: 0x0 which is clearly a case where we issue fadvise after pread of the same block already completed. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company