Re: Possible bug in logical replication.

Konstantin Knizhnik Thu, 17 May 2018 02:32:39 -0700


On 17.05.2018 10:45, Konstantin Knizhnik wrote:

We got the following assertion failure at our buildfarm of masterbranch of Postgres in contrib/test_decoding regression tests:


2018-05-07 19:50:07.241 MSK [5af083bf.54ae:49] DETAIL:  Streaming transactions 
committing after 0/2A00000, reading WAL from 0/29FFF1C.
2018-05-07 19:50:07.241 MSK [5af083bf.54ae:50] STATEMENT:  SELECT end_lsn FROM 
pg_replication_slot_advance('regression_slot1', '0/2A00174')
TRAP: FailedAssertion("!(((RecPtr) % 8192 >= (((uintptr_t) ((sizeof(XLogPageHeaderData))) + ((4) - 
1)) & ~((uintptr_t) ((4) - 1)))))", File: "xlogreader.c", Line: 241)


Stack trace is the following:

$ gdb -x ./gdbcmd --batch 
pgsql.build/tmp_install/home/buildfarm/build-farm/CORE-353-stable-func/inst/bin/postgres
 pgsql.build/contrib/test_decoding/tmp_check/data/core
[New LWP 21678]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: buildfarm regression [local] SELECT            
                     '.
Program terminated with signal SIGABRT, Aborted.
#0  0xf7722c89 in __kernel_vsyscall ()
#0  0xf7722c89 in __kernel_vsyscall ()
#1  0xf6b5ddd0 in __libc_signal_restore_set (set=0xffaf2240) at 
../sysdeps/unix/sysv/linux/nptl-signals.h:79
#2  __GI_raise (sig=6) at ../sysdeps/unix/sysv/linux/raise.c:48
#3  0xf6b5f297 in __GI_abort () at abort.c:89
#4  0x56b3931a in ExceptionalCondition (conditionName=0x56bd0c38 "!(((RecPtr) % 8192 >= (((uintptr_t) 
((sizeof(XLogPageHeaderData))) + ((4) - 1)) & ~((uintptr_t) ((4) - 1)))))", errorType=0x56b8bf61 
"FailedAssertion", fileName=0x56bd0df0 "xlogreader.c", lineNumber=241) at assert.c:54
#5  0x5678c573 in XLogReadRecord (state=0x57628c84, RecPtr=44040192, 
errormsg=0xffaf2560) at xlogreader.c:241
#6  0x569c3191 in pg_logical_replication_slot_advance (startlsn=<optimized 
out>, moveto=44040564) at slotfuncs.c:370
#7  0x569c3c8e in pg_replication_slot_advance (fcinfo=0xffaf2708) at 
slotfuncs.c:487
#8  0x568a69c0 in ExecMakeTableFunctionResult (setexpr=0x57626e30, 
econtext=0x57626d88, argContext=0x57620b48, expectedDesc=0x57627e44, 
randomAccess=false) at execSRF.c:231
#9  0x568b41d3 in FunctionNext (node=0x57626cfc) at nodeFunctionscan.c:94
#10 0x568a5ce2 in ExecScanFetch (recheckMtd=0x568b3ec0 <FunctionRecheck>, 
accessMtd=0x568b3ed0 <FunctionNext>, node=0x57626cfc) at execScan.c:95
#11 ExecScan (node=0x57626cfc, accessMtd=0x568b3ed0 <FunctionNext>, 
recheckMtd=0x568b3ec0 <FunctionRecheck>) at execScan.c:162
#12 0x568b4243 in ExecFunctionScan (pstate=0x57626cfc) at nodeFunctionscan.c:270
#13 0x5689caba in ExecProcNode (node=0x57626cfc) at 
../../../src/include/executor/executor.h:238
#14 ExecutePlan (execute_once=<optimized out>, dest=0x0, direction=NoMovementScanDirection, 
numberTuples=<optimized out>, sendTuples=<optimized out>, operation=CMD_SELECT, 
use_parallel_mode=<optimized out>, planstate=0x57626cfc, estate=0x57626bf0) at execMain.c:1731
#15 standard_ExecutorRun (queryDesc=0x5760f248, direction=ForwardScanDirection, 
count=0, execute_once=true) at execMain.c:368
#16 0x56a132cd in PortalRunSelect (portal=portal@entry=0x575c4ea8, 
forward=forward@entry=true, count=0, count@entry=2147483647, dest=0x576232d4) 
at pquery.c:932
#17 0x56a14b00 in PortalRun (portal=0x575c4ea8, count=2147483647, isTopLevel=true, 
run_once=true, dest=0x576232d4, altdest=0x576232d4, completionTag=0xffaf2c40 
"") at pquery.c:773
#18 0x56a0fbb7 in exec_simple_query (query_string=query_string@entry=0x57579070 
"SELECT end_lsn FROM pg_replication_slot_advance('regression_slot1', '0/2A00174') 
") at postgres.c:1122
#19 0x56a11a8e in PostgresMain (argc=1, argv=0x575a0b8c, dbname=<optimized out>, 
username=0x575a09f0 "buildfarm") at postgres.c:4153
#20 0x566cd9cb in BackendRun (port=0x5759a358) at postmaster.c:4361
#21 BackendStartup (port=0x5759a358) at postmaster.c:4033
#22 ServerLoop () at postmaster.c:1706
#23 0x5698a608 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) 
at postmaster.c:1379
#24 0x566cf642 in main (argc=<optimized out>, argv=<optimized out>) at 
main.c:228



As you can see, assertion failure happen because specified startlsn (0x2a00000) 
is not considered to be valid.
This LSN is taken from slot's confirmed flush LSN in 
pg_replication_slot_advance:
        startlsn = MyReplicationSlot->data.confirmed_flush;


Unfortunately I was not able to reproduce the problem even repeating this 
regression tests 1000 times, so it seems to be very difficult to reproduced 
non-deterministic bug.
I wonder if there is a warranty that confirmed_flush always points to the start 
of the record.
Inspecting of xlogreader.c code shows that confirmed_flush is for example 
assigned EndRecPtr in DecodingContextFindStartpoint.
And EndRecPtr is updated in  XLogReadRecord and if record doesn't cross page 
boundary, then the following formula is used:

                state->EndRecPtr = RecPtr + MAXALIGN(total_len);

And if record ends at page boundary, then looks like EndRecPtr can point to 
page boundary.
It is confirmed by the following comment in XLogReadRecord function:

                /*
                 * RecPtr is pointing to end+1 of the previous WAL record.  If 
we're
                 * at a page boundary, no more records can fit on the current 
page. We
                 * must skip over the page header, but we can't do that until 
we've
                 * read in the page, since the header size is variable.
                 */

But it means that it is possible that confirmed_flush can also point to the 
page boundary which will cause this assertion failure in XLogReadRecord:

                /*
                 * Caller supplied a position to start at.
                 *
                 * In this case, the passed-in record pointer should already be
                 * pointing to a valid record starting position.
                 */
                Assert(XRecOffIsValid(RecPtr));


May be this bug is very difficult to reproduce because it takes place only if 
WAL record preceding specified start point ends exactly at page boundary. The 
probability of it seems to be quite small...

I am not so familiar with wal iteration code and logical decoding, so I will be 
pleased to receive confirmation orconfutation of my analysis of the problem.
--
Konstantin Knizhnik
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company


Looks like it is not the first time this problem was encountered:

https://www.postgresql.org/message-id/flat/34d66f63-40a9-4c3e-c9a1-248d1e393d29%40enterprisedb.com#[email protected]

I wonder who is now maintaining logical replication stuff and whetherthis bug is going to be fixed?I think that using restart_lsn instead of confirmed_flush is not rightapproach.If restart_lsn is not available and confirmed_flush is pointing to pageboundary, then in any case we should somehow handle this case and adjuststartlsn to point on the valid record position (by jjust adding pageheader size?).


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Possible bug in logical replication.

Reply via email to