Hi, On 2019-12-31 17:05:31 +1300, Thomas Munro wrote: > There is one potentially interesting case that doesn't require any > kind of shared cache invalidation AFAICS. XLogReadBufferExtended() > calls smgrnblocks() for every buffer access, even if the buffer is > already in our buffer pool.
Yea, that's really quite bad*. The bit about doing so even when already in the buffer pool is particularly absurd. Needing to have special handling in mdcreate() for XLogReadBufferExtended() always calling it is also fairly ugly. > It doesn't seem great that we are effectively making system calls for > most WAL records we replay, but, sadly, in this case the patch didn't > really make any measurable difference when run without strace on this > Linux VM. I suspect there is some workload and stack where it would > make a difference (CF the read(postmaster pipe) call for every WAL > record that was removed), but this is just something I noticed in > passing while working on something else, so I haven't investigated > much. I wonder if that's just because your workload is too significantly bottlenecked elsewhere: > postgres -D pgdata -c checkpoint_timeout=60min > In another shell: > pgbench -i -s100 postgres > pgbench -M prepared -T60 postgres > killall -9 postgres > mv pgdata pgdata-save With scale 100, but the default shared_buffers, you'll frequently hit the OS for reads/writes. Which will require the same metadata in the kernel, but then also memcpys between kernel and userspace. A word of caution about strace's -c: In my experience the total time measurements are very imprecise somehow. I think it might be that some of the overhead of ptracing will be attributed to the syscalls or such, which means frequent syscalls appear relatively more expensive than they really are. Greetings, Andres Freund * it insults my sense of aesthetics