On Thursday, September 16, 2021 8:36 PM Amit Kapila <amit.kapil...@gmail.com>: > On Thu, Sep 16, 2021 at 6:29 AM houzj.f...@fujitsu.com > <houzj.f...@fujitsu.com> wrote: > > > > After applying the patch, > > I saw the same problem and can reproduce it by the following steps: > > > > 1) execute the SQLs. > > -----------SQL----------- > > CREATE TABLE decoding_test(x integer, y text); > > SELECT pg_create_logical_replication_slot('test_slot', 'test_decoding'); > > INSERT INTO decoding_test(x,y) SELECT s, s::text FROM generate_series(1,4) > s; > > > > -- use the lsn here to execute pg_recvlogical later > > SELECT lsn FROM pg_logical_slot_peek_changes('test_slot', NULL, NULL) > ORDER BY lsn DESC LIMIT 1; > > INSERT INTO decoding_test(x,y) SELECT s, s::text FROM > generate_series(5,50) s; > > ---------------------- > > > > 2) Then, if I execute the following command twice: > > # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f - > > BEGIN 708 > > table public.decoding_test: INSERT: x[integer]:1 y[text]:'1' > > table public.decoding_test: INSERT: x[integer]:2 y[text]:'2' > > table public.decoding_test: INSERT: x[integer]:3 y[text]:'3' > > table public.decoding_test: INSERT: x[integer]:4 y[text]:'4' > > COMMIT 708 > > > > # pg_recvlogical -E lsn -d postgres -S 'test_slot' --start --no-loop -f - > > BEGIN 709 > > > > It still generated ' BEGIN 709' in the second time execution. > > But it will output nothing in the second time execution without the patch. > > > > I think here the reason is that the first_lsn of a transaction is > always equal to end_lsn of the previous transaction (See comments > above first_lsn and end_lsn fields of ReorderBufferTXN). I have not > debugged but I think in StreamLogicalLog() the cur_record_lsn after > receiving 'w' message, in this case, will be equal to endpos whereas > we expect to be greater than endpos to exit. Before the patch, it will > always get the 'k' message where we expect the received lsn to be > equal to endpos to conclude that we can exit. Do let me know if your > analysis differs?
After debugging it, I agree with your analysis. WITH the patch: in function StreamLogicalLog(), I can see the cur_record_lsn is equal to endpos which result in unexpected record. WITHOUT the patch: In function StreamLogicalLog(), it first received a 'k' message which will break the loop by the following code. if (endposReached) { prepareToTerminate(conn, endpos, true, InvalidXLogRecPtr); time_to_abort = true; break; } Best regards, Hou zj