Hi!

b36fbd9f8d message says that inconsistency may still be possible because
statistics are not completely consistent for a single scan of
pg_stat_replication_slots under concurrent replication slot drop or
creation activity.

Seems there is a reproduction of such a case via isolation test.
Please see the repslot_stat.spec attached.

In an build with asserts performing such a test will result in a
crash in the checkpointer during server shutdown.

Please see postmaster.log and bt.txt. There are
wal_level = logical
max_replication_slots = 4
logical_decoding_work_mem = 64kB
autovacuum_naptime = 1d
log_min_messages = DEBUG
in the postgresql.conf.

The assert occures due to this stats entry:
(gdb) p *ps
$1 = {key = {kind = PGSTAT_KIND_REPLSLOT, dboid = 0, objoid = 0}, dropped = 
true, refcount = {value = 0}, generation = {value = 1}, body = 1099512025088}

Also the commit message says that such an issue should unlikely be a problem in 
practice.
So i doubt whether this should be treated as a bug and to be fixed?

Would be glad to figure this out.

With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
setup {
	SELECT slot_name FROM pg_create_logical_replication_slot('test', 'pgoutput');
}

teardown {
	SELECT pg_drop_replication_slot('test');
}

session s1

step select {
		SELECT data
		FROM
		pg_logical_slot_peek_binary_changes('test', NULL, NULL);
}

permutation
	select

permutation
	select
2024-12-04 02:59:14.764 MSK postmaster[189291] DEBUG:  registering background worker "logical replication launcher"
2024-12-04 02:59:14.764 MSK postmaster[189291] DEBUG:  mmap(150994944) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2024-12-04 02:59:14.768 MSK postmaster[189291] DEBUG:  dynamic shared memory system will support 674 segments
2024-12-04 02:59:14.768 MSK postmaster[189291] DEBUG:  created dynamic shared memory control segment 2406959938 (26976 bytes)
2024-12-04 02:59:14.769 MSK postmaster[189291] DEBUG:  max_safe_fds = 983, usable_fds = 1000, already_open = 7
2024-12-04 02:59:14.769 MSK postmaster[189291] LOG:  starting PostgreSQL 17.2 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
2024-12-04 02:59:14.769 MSK postmaster[189291] LOG:  listening on Unix socket "/tmp/pg_regress-wgpIvh/.s.PGSQL.55314"
2024-12-04 02:59:14.770 MSK checkpointer[189292] DEBUG:  checkpointer updated shared memory configuration values
2024-12-04 02:59:14.770 MSK startup[189294] LOG:  database system was shut down at 2024-12-04 02:59:14 MSK
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  checkpoint record is at 0/14DDD90
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  redo record is at 0/14DDD90; shutdown true
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  next transaction ID: 738; next OID: 13541
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  next MultiXactId: 1; next MultiXactOffset: 0
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  oldest unfrozen transaction ID: 730, in database 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  oldest MultiXactId: 1, in database 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  commit timestamp Xid oldest/newest: 0/0
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  transaction ID wrap limit is 2147484377, limited by database with OID 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  starting up replication slots
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  xmin required by slots: data 0, catalog 0
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  starting up replication origin progress state
2024-12-04 02:59:14.771 MSK startup[189294] DEBUG:  reading stats file "pg_stat/pgstat.stat"
2024-12-04 02:59:14.772 MSK startup[189294] DEBUG:  removing permanent stats file "pg_stat/pgstat.stat"
2024-12-04 02:59:14.773 MSK startup[189294] DEBUG:  MultiXactId wrap limit is 2147483648, limited by database with OID 1
2024-12-04 02:59:14.773 MSK startup[189294] DEBUG:  MultiXact member stop limit is now 4294914944 based on MultiXact 1
2024-12-04 02:59:14.775 MSK postmaster[189291] DEBUG:  starting background worker process "logical replication launcher"
2024-12-04 02:59:14.776 MSK autovacuum launcher[189296] DEBUG:  autovacuum launcher started
2024-12-04 02:59:14.776 MSK postmaster[189291] LOG:  database system is ready to accept connections
2024-12-04 02:59:14.776 MSK logical replication launcher[189297] DEBUG:  logical replication launcher started
2024-12-04 02:59:14.799 MSK postmaster[189291] DEBUG:  forked new backend, pid=189298 socket=9
2024-12-04 02:59:14.818 MSK postmaster[189291] DEBUG:  server process (PID 189298) exited with exit code 0
2024-12-04 02:59:14.829 MSK postmaster[189291] DEBUG:  forked new backend, pid=189300 socket=9
2024-12-04 02:59:14.868 MSK postmaster[189291] DEBUG:  server process (PID 189300) exited with exit code 0
2024-12-04 02:59:14.888 MSK postmaster[189291] DEBUG:  forked new backend, pid=189303 socket=9
2024-12-04 02:59:14.902 MSK client backend[189303] isolation/repslot_stat DEBUG:  parse <unnamed>: SELECT set_config('application_name',
	  current_setting('application_name') || '/' || $1,
	  false)
2024-12-04 02:59:14.902 MSK client backend[189303] isolation/repslot_stat DEBUG:  bind <unnamed> to <unnamed>
2024-12-04 02:59:14.903 MSK postmaster[189291] DEBUG:  forked new backend, pid=189304 socket=9
2024-12-04 02:59:14.906 MSK client backend[189304] isolation/repslot_stat DEBUG:  parse <unnamed>: SELECT set_config('application_name',
	  current_setting('application_name') || '/' || $1,
	  false)
2024-12-04 02:59:14.907 MSK client backend[189304] isolation/repslot_stat DEBUG:  bind <unnamed> to <unnamed>
2024-12-04 02:59:14.907 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  parse isolationtester_waiting: SELECT pg_catalog.pg_isolation_test_session_is_blocked($1, '{189304}')
2024-12-04 02:59:14.908 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  snapshot of 0+0 running transaction ids (lsn 0/18FE8F8 oldest xid 740 latest complete 739 next xid 740)
2024-12-04 02:59:14.908 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  xmin required by slots: data 0, catalog 740
2024-12-04 02:59:14.909 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  searching for logical decoding starting point, starting at 0/18FE8C0
2024-12-04 02:59:14.909 MSK client backend[189303] isolation/repslot_stat/control connection LOG:  logical decoding found consistent point at 0/18FE8C0
2024-12-04 02:59:14.909 MSK client backend[189303] isolation/repslot_stat/control connection DETAIL:  There are no running transactions.
2024-12-04 02:59:14.909 MSK client backend[189303] isolation/repslot_stat/control connection STATEMENT:  
		SELECT slot_name FROM pg_create_logical_replication_slot('test', 'pgoutput');
	
2024-12-04 02:59:14.910 MSK client backend[189304] isolation/repslot_stat/s1 ERROR:  option "proto_version" missing
2024-12-04 02:59:14.910 MSK client backend[189304] isolation/repslot_stat/s1 CONTEXT:  slot "test", output plugin "pgoutput", in the startup callback
2024-12-04 02:59:14.910 MSK client backend[189304] isolation/repslot_stat/s1 STATEMENT:  
			SELECT data
			FROM
			pg_logical_slot_peek_binary_changes('test', NULL, NULL);
	
2024-12-04 02:59:14.911 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  xmin required by slots: data 0, catalog 0
2024-12-04 02:59:14.917 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  snapshot of 0+0 running transaction ids (lsn 0/18FE930 oldest xid 740 latest complete 739 next xid 740)
2024-12-04 02:59:14.917 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  xmin required by slots: data 0, catalog 740
2024-12-04 02:59:14.917 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  searching for logical decoding starting point, starting at 0/18FE8F8
2024-12-04 02:59:14.917 MSK client backend[189303] isolation/repslot_stat/control connection LOG:  logical decoding found consistent point at 0/18FE8F8
2024-12-04 02:59:14.917 MSK client backend[189303] isolation/repslot_stat/control connection DETAIL:  There are no running transactions.
2024-12-04 02:59:14.917 MSK client backend[189303] isolation/repslot_stat/control connection STATEMENT:  
		SELECT slot_name FROM pg_create_logical_replication_slot('test', 'pgoutput');
	
2024-12-04 02:59:14.918 MSK client backend[189304] isolation/repslot_stat/s1 ERROR:  option "proto_version" missing
2024-12-04 02:59:14.918 MSK client backend[189304] isolation/repslot_stat/s1 CONTEXT:  slot "test", output plugin "pgoutput", in the startup callback
2024-12-04 02:59:14.918 MSK client backend[189304] isolation/repslot_stat/s1 STATEMENT:  
			SELECT data
			FROM
			pg_logical_slot_peek_binary_changes('test', NULL, NULL);
	
2024-12-04 02:59:14.919 MSK client backend[189303] isolation/repslot_stat/control connection DEBUG:  xmin required by slots: data 0, catalog 0
2024-12-04 02:59:14.922 MSK postmaster[189291] DEBUG:  server process (PID 189303) exited with exit code 0
2024-12-04 02:59:14.929 MSK postmaster[189291] DEBUG:  server process (PID 189304) exited with exit code 0
2024-12-04 02:59:14.933 MSK postmaster[189291] DEBUG:  postmaster received shutdown request signal
2024-12-04 02:59:14.933 MSK postmaster[189291] LOG:  received fast shutdown request
2024-12-04 02:59:14.933 MSK postmaster[189291] LOG:  aborting any active transactions
2024-12-04 02:59:14.933 MSK autovacuum launcher[189296] DEBUG:  autovacuum launcher shutting down
2024-12-04 02:59:14.933 MSK logical replication launcher[189297] DEBUG:  logical replication launcher shutting down
2024-12-04 02:59:14.934 MSK postmaster[189291] LOG:  background worker "logical replication launcher" (PID 189297) exited with exit code 1
2024-12-04 02:59:14.935 MSK checkpointer[189292] LOG:  shutting down
2024-12-04 02:59:14.935 MSK checkpointer[189292] LOG:  checkpoint starting: shutdown immediate
2024-12-04 02:59:14.935 MSK checkpointer[189292] DEBUG:  performing replication slot checkpoint
2024-12-04 02:59:14.942 MSK checkpointer[189292] DEBUG:  attempting to remove WAL segments older than log file 000000000000000000000000
2024-12-04 02:59:14.942 MSK checkpointer[189292] DEBUG:  SlruScanDirectory invoking callback on pg_subtrans/0000
2024-12-04 02:59:14.943 MSK checkpointer[189292] LOG:  checkpoint complete: wrote 924 buffers (5.6%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.008 s, sync=0.001 s, total=0.008 s; sync files=0, longest=0.000 s, average=0.000 s; distance=4226 kB, estimate=4226 kB; lsn=0/18FE930, redo lsn=0/18FE930
2024-12-04 02:59:14.943 MSK checkpointer[189292] DEBUG:  writing stats file "pg_stat/pgstat.stat"
TRAP: failed Assert("!ps->dropped"), File: "pgstat.c", Line: 1400, PID: 189292
postgres: checkpointer (ExceptionalCondition+0xb5)[0x556783109b82]
postgres: checkpointer (+0x6c2721)[0x556782f43721]
postgres: checkpointer (pgstat_before_server_shutdown+0xaf)[0x556782f42101]
postgres: checkpointer (shmem_exit+0x9d)[0x556782edc7fd]
postgres: checkpointer (+0x65b6b3)[0x556782edc6b3]
postgres: checkpointer (proc_exit+0x69)[0x556782edc600]
postgres: checkpointer (+0x5ad2f8)[0x556782e2e2f8]
postgres: checkpointer (CheckpointerMain+0x34e)[0x556782e2ddbc]
postgres: checkpointer (postmaster_child_launch+0x113)[0x556782e2f63f]
postgres: checkpointer (+0x5b4e05)[0x556782e35e05]
postgres: checkpointer (PostmasterMain+0x1358)[0x556782e321d5]
postgres: checkpointer (+0x465c64)[0x556782ce6c64]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x7f398dd37083]
postgres: checkpointer (_start+0x2e)[0x55678295e90e]
2024-12-04 02:59:15.151 MSK postmaster[189291] LOG:  checkpointer process (PID 189292) was terminated by signal 6: Aborted
2024-12-04 02:59:15.151 MSK postmaster[189291] LOG:  terminating any other active server processes
2024-12-04 02:59:15.151 MSK postmaster[189291] LOG:  abnormal database system shutdown
2024-12-04 02:59:15.151 MSK postmaster[189291] DEBUG:  cleaning up orphaned dynamic shared memory with ID 1292510994
2024-12-04 02:59:15.151 MSK postmaster[189291] DEBUG:  cleaning up dynamic shared memory control segment with ID 2406959938
2024-12-04 02:59:15.170 MSK postmaster[189291] LOG:  database system is shut down
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt full
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {4194304, 140733362963048, 0, 0, 140733362963056, 0, 0, 
0, 0, 139885183317226, 140733362963064, 93903060855054, 128, 0, 
140733362960704, 139885170600000}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007f398dd35859 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x7f398dec8bf7, sa_sigaction 
= 0x7f398dec8bf7}, sa_mask = {__val = {1, 139885170958366, 3, 140733362960884, 
12, 139885170958370, 2, 3978425822577950720, 7291390903560254008, 
93903060855008, 7291999826860081840, 0, 10474404646378503936, 140733362960976, 
184, 140733362961856}}, sa_flags = -2104104736, sa_restorer = 0x7fff0a1b82b0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#2  0x0000556783109bb5 in ExceptionalCondition (conditionName=0x5567833489c2 
"!ps->dropped", fileName=0x556783348497 "pgstat.c", lineNumber=1400) at 
assert.c:66
No locals.
#3  0x0000556782f43721 in pgstat_write_statsfile () at pgstat.c:1400
        shstats = 0x7f39822ff940
        kind_info = 0x0
        fpout = 0x5567b8443220
        format_id = 27638956
        tmpfile = 0x55678334895e "pg_stat/pgstat.tmp"
        statfile = 0x55678334843e "pg_stat/pgstat.stat"
        hstat = {hash_table = 0x5567b843f830, curbucket = 878, nbuckets = 1024, 
curitem = 0x7f398b0f1458, pnextitem = 154640, curpartition = 109, exclusive = 
false}
        ps = 0x7f398b0f1468
        __func__ = "pgstat_write_statsfile"
#4  0x0000556782f42101 in pgstat_before_server_shutdown (code=0, arg=0) at 
pgstat.c:485
No locals.
#5  0x0000556782edc7fd in shmem_exit (code=0) at ipc.c:243
        __func__ = "shmem_exit"
#6  0x0000556782edc6b3 in proc_exit_prepare (code=0) at ipc.c:198
        __func__ = "proc_exit_prepare"
#7  0x0000556782edc600 in proc_exit (code=0) at ipc.c:111
        __func__ = "proc_exit"
#8  0x0000556782e2e2f8 in HandleCheckpointerInterrupts () at checkpointer.c:602
No locals.
#9  0x0000556782e2ddbc in CheckpointerMain (startup_data=0x0, 
startup_data_len=0) at checkpointer.c:352
        flags = 0
        chkpt_or_rstpt_timed = false
        do_checkpoint = false
        now = 1733270354
        elapsed_secs = 0
        cur_timeout = 300
        chkpt_or_rstpt_requested = false
        local_sigjmp_buf = {{__jmpbuf = {93903069464336, 6793796310280149260, 
93903060855008, 140733362963120, 0, 0, 6793796303604914444, 
826824280619063564}, __mask_was_saved = 1, __saved_mask = {__val = 
{18446744066192964099, 0, 0, 0, 139885179003583, 12884901904, 140733362962512, 
140733362962448, 10474404646378503936, 0, 10474404646378503936, 1, 
93903066625472, 139885179008032, 93903961452816, 140733362962512}}}}
        checkpointer_context = 0x5567b84944a0
        __func__ = "CheckpointerMain"
#10 0x0000556782e2f63f in postmaster_child_launch (child_type=B_CHECKPOINTER, 
startup_data=0x0, startup_data_len=0, client_sock=0x0) at launch_backend.c:277
        pid = 0
#11 0x0000556782e35e05 in StartChildProcess (type=B_CHECKPOINTER) at 
postmaster.c:3928
        pid = -1856204921
        __func__ = "StartChildProcess"
#12 0x0000556782e321d5 in PostmasterMain (argc=8, argv=0x5567b843f050) at 
postmaster.c:1357
        opt = -1
        status = 0
        userDoption = 0x5567b846c6a0 
"/home/postgres/projects/11927-asan-van17/postgrespro/contrib/test_decoding/tmp_check_iso/data"
        listen_addr_saved = false
        output_config_variable = 0x0
        __func__ = "PostmasterMain"
#13 0x0000556782ce6c64 in main (argc=8, argv=0x5567b843f050) at main.c:197
        do_check_root = true

Reply via email to