On 13/06/2025 11:20 pm, Andres Freund wrote:
Attached is a patch that fixes the problem for me. Alexander, Konstantin,
could you verify that it also fixes the problem for you?
Given that it does address the problem for me, I'm inclined to push this
fairly soon, the barrier is pretty obviously required.
Unfortunately I still able to reproduce assertion failure after 17450
seconds (`--enable-debug --enable-cassert CFLAGS=-O0`):
* frame #0: 0x0000000187248704 libsystem_kernel.dylib`__pthread_kill + 8
frame #1: 0x000000018727fc28 libsystem_pthread.dylib`pthread_kill + 288
frame #2: 0x000000018718dae8 libsystem_c.dylib`abort + 180
frame #3: 0x00000001011a0fbc
postgres`ExceptionalCondition(conditionName="ioh->op ==
PGAIO_OP_INVALID", fileName="aio_io.c", lineNumber=161) at assert.c:66:2
frame #4: 0x0000000100ea9b40
postgres`pgaio_io_before_start(ioh=0x000000010a6e22f0) at aio_io.c:161:2
frame #5: 0x0000000100ea9a00
postgres`pgaio_io_start_readv(ioh=0x000000010a6e22f0, fd=12, iovcnt=1,
offset=85565440) at aio_io.c:81:2
frame #6: 0x0000000100ec8798
postgres`FileStartReadV(ioh=0x000000010a6e22f0, file=4, iovcnt=1,
offset=85565440, wait_event_info=167772181) at fd.c:2241:2
frame #7: 0x0000000100f1b7d0
postgres`mdstartreadv(ioh=0x000000010a6e22f0, reln=0x000000010b0289c8,
forknum=MAIN_FORKNUM, blocknum=10445, buffers=0x000000016f5ed998,
nblocks=1) at md.c:1019:8
frame #8: 0x0000000100f1ed88
postgres`smgrstartreadv(ioh=0x000000010a6e22f0, reln=0x000000010b0289c8,
forknum=MAIN_FORKNUM, blocknum=10445, buffers=0x000000016f5ed998,
nblocks=1) at smgr.c:758:2
frame #9: 0x0000000100eb1a48
postgres`AsyncReadBuffers(operation=0x000000011d80d108,
nblocks_progress=0x000000016f5edeb4) at bufmgr.c:1959:3
frame #10: 0x0000000100eb0c24
postgres`StartReadBuffersImpl(operation=0x000000011d80d108,
buffers=0x000000011d80cdf4, blockNum=10445, nblocks=0x000000016f5edeb4,
flags=0, allow_forwarding=true) at bufmgr.c:1428:18
I wonder if we also need second write barrier in pgaio_io_update_state:
/*
* Ensure the changes signified by the new state are visible before the
* new state becomes visible.
*/
pg_write_barrier();
ioh->state = new_state;
/*
* Make sure that status is written
*/
pg_write_barrier();
Otherwise is is not clear when status is written.
Also I think that replacing bitfields with `uint8` and may be even with
`int`, is good idea at least to avoids false sharing.