Hi hackers, I believe we found a bug in logical decoding. It only occures with casserts enabled. It was originally discovered and reproduced by Murat Kabilov and Ildus Kurbangaliev. Here is the stacktrace we've got:
#0 0x00007facc66ef82f in raise () from /usr/lib/libc.so.6 #1 0x00007facc66da672 in abort () from /usr/lib/libc.so.6 #2 0x0000000000ac4ebf in ExceptionalCondition ( conditionName=0xccdea8 "!(prev_first_lsn < cur_txn->first_lsn)", errorType=0xccdce4 "FailedAssertion", fileName=0xccdd38 "reorderbuffer.c", lineNumber=680) at assert.c:54 #3 0x00000000008a9515 in AssertTXNLsnOrder (rb=0x25ca128) at reorderbuffer.c:680 #4 0x00000000008a900f in ReorderBufferTXNByXid (rb=0x25ca128, xid=65609, create=true, is_new=0x0, lsn=211590864, create_as_top=true) at reorderbuffer.c:559 #5 0x00000000008abf0d in ReorderBufferAddNewTupleCids (rb=0x25ca128, xid=65609, lsn=211590864, node=..., tid=..., cmin=0, cmax=4294967295, combocid=4294967295) at reorderbuffer.c:2098 #6 0x00000000008b096b in SnapBuildProcessNewCid (builder=0x25d0158, xid=65610, lsn=211590864, xlrec=0x25d60b8) at snapbuild.c:781 #7 0x000000000089d01c in DecodeHeap2Op (ctx=0x25ba0b8, buf=0x7ffd0e294da0) at decode.c:382 #8 0x000000000089c8ca in LogicalDecodingProcessRecord (ctx=0x25ba0b8, record=0x25ba378) at decode.c:125 #9 0x00000000008a124c in DecodingContextFindStartpoint (ctx=0x25ba0b8) at logical.c:492 #10 0x00000000008b9c3d in CreateReplicationSlot (cmd=0x257be20) at walsender.c:957 #11 0x00000000008baa60 in exec_replication_command ( cmd_string=0x24f5b08 "CREATE_REPLICATION_SLOT temp_slot_name TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1531 #12 0x0000000000937230 in PostgresMain (argc=1, argv=0x25233a8, dbname=0x2523380 "postgres", username=0x24f23c8 "zilder") at postgres.c:4245 #13 0x0000000000881453 in BackendRun (port=0x251a900) at postmaster.c:4431 #14 0x0000000000880b4f in BackendStartup (port=0x251a900) at postmaster.c:4122 #15 0x000000000087cbbe in ServerLoop () at postmaster.c:1704 #16 0x000000000087c34a in PostmasterMain (argc=3, argv=0x24f0330) at postmaster.c:1377 #17 0x00000000007926b6 in main (argc=3, argv=0x24f0330) at main.c:228 After viewing coredump we see that `prev_first_lsn == cur_txn->first_lsn` The problem seems to be that ReorderBuffer adds two ReorderBufferTXNs with the same LSN, but different transaction ids: subxid and top-level xid. See FIX part below. STEPS TO REPRODUCE ------------------ We were able reproduce it on 10, 11 and on master branch. Postgres was configured as: ./configure --enable-cassert CFLAGS='-ggdb3 -O0' --prefix=$HOME/pg12 Additional options in postgresql.conf: wal_level='logical' max_connections=1000 max_replication_slots=100 max_wal_senders=100 max_logical_replication_workers=100 pgbench scripts: $ cat create_table.sql BEGIN; SAVEPOINT p1; CREATE temp TABLE t_t (id INT) ON COMMIT DROP; ROLLBACK TO SAVEPOINT p1; ROLLBACK; $ cat create_slot.sql BEGIN ISOLATION LEVEL REPEATABLE READ READ ONLY; SELECT pg_create_logical_replication_slot('test' || pg_backend_pid(), 'pgoutput', true); SELECT pg_drop_replication_slot('test' || pg_backend_pid()); ROLLBACK; Run in parallel terminals: $HOME/pg12/bin/pgbench postgres -f create_table.sql -T1000 -c50 -j50 $HOME/pg12/bin/pgbench postgres -f create_slot.sql -T1000 -c50 -j50 It may take some time. On my local machine it breaks in few seconds. FIX? ---- Can't say that i have enough understanding of what's going on in the logical decoding code. But the one thing i've noticed is inconsistency of xids used to make ReorderBufferTXNByXid() call: 1. first, in DecodeHeap2Op() function ReorderBufferProcessXid() is called with subtransaction id; it actually creates ReorderBufferTXN and adds it to reorder buffer's hash table and toplevel_by_lsn list; 2. second, within ReorderBufferXidSetCatalogChanges() it uses same subxid to lookup the ReorderBufferTXN that was created before, successfully; 3. now in ReorderBufferAddNewTupleCids() it uses top-level transaction id instead for lookup; it cannot find xid in hash table and tries to add a new record with the same LSN. And it fails since this LSN is already in toplevel_by_lsn list. Attached is a simple patch that uses subxid instead of top-level xid in ReorderBufferAddNewTupleCids() call. It seems to fix the bug, but i'm not sure that this is a valid change. Can someone please verify it or maybe suggest a better solution for the issue? Best regards, Ildar
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c index 3e9d4cd79f..6c03c85768 100644 --- a/src/backend/replication/logical/snapbuild.c +++ b/src/backend/replication/logical/snapbuild.c @@ -778,7 +778,7 @@ SnapBuildProcessNewCid(SnapBuild *builder, TransactionId xid, */ ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn); - ReorderBufferAddNewTupleCids(builder->reorder, xlrec->top_xid, lsn, + ReorderBufferAddNewTupleCids(builder->reorder, xid, lsn, xlrec->target_node, xlrec->target_tid, xlrec->cmin, xlrec->cmax, xlrec->combocid);