On 17/06/2025 3:22 am, Tom Lane wrote:
Konstantin Knizhnik<knizh...@garret.ru> writes:
On 16/06/2025 6:11 pm, Andres Freund wrote:
I unfortunately can't repro this issue so far.
But unfortunately it means that the problem is not fixed.
FWIW, I get similar results to Andres' on a Mac Mini M4 Pro
using MacPorts' current compiler release (clang version 19.1.7).
The currently-proposed test case fails within a few minutes on
e9a3615a5^ but doesn't fail in a couple of hours on e9a3615a5.

However, I cannot repro that on a slightly older Mini M1 using Apple's
current release (clang-1700.0.13.5, which per wikipedia is really LLVM
19.1.4).  It seems to work fine even without e9a3615a5.  So the whole
thing is still depressingly phase-of-the-moon-dependent.

I don't doubt that Konstantin has found a different issue, but
it's hard to be sure about the fix unless we can get it to be
more reproducible.  Neither of my machines has ever shown the
symptom he's getting.

                        regards, tom lane


Unfortunately I still able to reproduce assertion failure (with Andreas patch but without my "fixes" - uint8 instead of bitfields).

Apple M2 Pro, Ventura 13.7.6, clang 15.0

Postgres build option: --without-icu --enable-debug --enable-cassert CFLAGS=-O0 Postgres config:

io_max_concurrency=1
io_combine_limit=1
synchronize_seqscans=false
restart_after_crash=false
max_parallel_workers_per_gather=0
fsync=off

Scenario proposed by Andreas:
c=16; pgbench -c $c -j $c -M prepared -n -f <(echo "select count(*) FROM 
large;") -T 30000 -P 10


Now it takes me much more time to reproduce the error. Night 30000 seconds 
iterations passed normally.
I thought that it can be somehow caused or enforces by hibernation.
When I have reproduced it last time, I do not change default setting for display 
"Prevent automatic sleeping on power adapter when the display is off" 
(disabled).
It was cause strange timing effect I have reported: when system is hibernated, alarm is 
also "hibernated" and so total test time is much more than specified with -T 
option.
Now I tried explicitly force hibernation by periodically closing laptop cover - 
it doesn't help. But I disable one again this option and let it run. Several 
times it normally continue execution after wakeup. But finally I got assertion 
failure:

progress: 18290.0 s, 7.3 tps, lat 2229.243 ms stddev 147.211, 0 failed
progress: 18300.0 s, 7.3 tps, lat 2164.216 ms stddev 176.251, 0 failed
progress: 18310.0 s, 7.5 tps, lat 2105.803 ms stddev 202.003, 0 failed
progress: 18320.0 s, 7.7 tps, lat 2162.209 ms stddev 209.344, 0 failed
progress: 18330.0 s, 7.0 tps, lat 2157.891 ms stddev 181.369, 0 failed
progress: 18340.0 s, 7.6 tps, lat 2120.269 ms stddev 169.287, 0 failed
progress: 18350.0 s, 7.3 tps, lat 2178.657 ms stddev 159.984, 0 failed
pgbench: error: client 4 aborted in command 0 (SQL) of script 0; perhaps the 
backend died while processing
WARNING:  terminating connection because of crash of another server process

Not sure if it is related with hibernation or not (looks like it happens at the 
moment when computer is going to hibernate - not sure how to check it).
Backtrace is usual:

  * frame #0: 0x0000000187248704 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018727fc28 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000018718dae8 libsystem_c.dylib`abort + 180
    frame #3: 0x00000001034b2080 postgres`ExceptionalCondition(conditionName="ioh->op == 
PGAIO_OP_INVALID", fileName="aio_io.c", lineNumber=161) at assert.c:66:2
    frame #4: 0x00000001031bac04 
postgres`pgaio_io_before_start(ioh=0x000000010c9f6e40) at aio_io.c:161:2
    frame #5: 0x00000001031baac4 
postgres`pgaio_io_start_readv(ioh=0x000000010c9f6e40, fd=10, iovcnt=1, 
offset=180944896) at aio_io.c:81:2
    frame #6: 0x00000001031d985c 
postgres`FileStartReadV(ioh=0x000000010c9f6e40, file=3, iovcnt=1, 
offset=180944896, wait_event_info=167772181) at fd.c:2241:2
    frame #7: 0x000000010322c894 postgres`mdstartreadv(ioh=0x000000010c9f6e40, 
reln=0x000000011f0289c8, forknum=MAIN_FORKNUM, blocknum=22088, 
buffers=0x000000016d2dd998, nblocks=1) at md.c:1019:8
    frame #8: 0x000000010322fe4c 
postgres`smgrstartreadv(ioh=0x000000010c9f6e40, reln=0x000000011f0289c8, 
forknum=MAIN_FORKNUM, blocknum=22088, buffers=0x000000016d2dd998, nblocks=1) at 
smgr.c:758:2
    frame #9: 0x00000001031c2b0c 
postgres`AsyncReadBuffers(operation=0x000000010f81b910, 
nblocks_progress=0x000000016d2ddeb4) at bufmgr.c:1959:3
    frame #10: 0x00000001031c1ce8 
postgres`StartReadBuffersImpl(operation=0x000000010f81b910, 
buffers=0x000000010f81b3e0, blockNum=22088, nblocks=0x000000016d2ddeb4, 
flags=0, allow_forwarding=true) at bufmgr.c:1428:18
    frame #11: 0x00000001031c182c 
postgres`StartReadBuffers(operation=0x000000010f81b910, 
buffers=0x000000010f81b3e0, blockNum=22088, nblocks=0x000000016d2ddeb4, 
flags=0) at bufmgr.c:1500:9
    frame #12: 0x00000001031bee44 
postgres`read_stream_start_pending_read(stream=0x000000010f81b378) at 
read_stream.c:335:14
    frame #13: 0x00000001031be528 
postgres`read_stream_look_ahead(stream=0x000000010f81b378) at 
read_stream.c:493:3
    frame #14: 0x00000001031be0c0 
postgres`read_stream_next_buffer(stream=0x000000010f81b378, 
per_buffer_data=0x0000000000000000) at read_stream.c:971:2
    frame #15: 0x0000000102bb3d34 
postgres`heap_fetch_next_buffer(scan=0x000000010f81ae58, 
dir=ForwardScanDirection) at heapam.c:675:18
    frame #16: 0x0000000102ba4a88 
postgres`heapgettup_pagemode(scan=0x000000010f81ae58, dir=ForwardScanDirection, 
nkeys=0, key=0x0000000000000000) at heapam.c:1037:3
    frame #17: 0x0000000102ba5058 
postgres`heap_getnextslot(sscan=0x000000010f81ae58, 
direction=ForwardScanDirection, slot=0x000000010f811330) at heapam.c:1391:3
    frame #18: 0x0000000102f26448 
postgres`table_scan_getnextslot(sscan=0x000000010f81ae58, 
direction=ForwardScanDirection, slot=0x000000010f811330) at tableam.h:1031:9
    frame #19: 0x0000000102f26254 postgres`SeqNext(node=0x000000010f811110) at 
nodeSeqscan.c:81:6
    frame #20: 0x0000000102f2691c 
postgres`ExecScanFetch(node=0x000000010f811110, epqstate=0x0000000000000000, 
accessMtd=(postgres`SeqNext at nodeSeqscan.c:52), 
recheckMtd=(postgres`SeqRecheck at nodeSeqscan.c:91)) at execScan.h:126:9

If you are interested - I have core file.
May be it is separate problem - not related with race condition caused by lack 
of memory barrier. But frankly speaking - unlikely.
I think hibernation just affects processes schedule which cause this effect.

Reply via email to