On Wednesday, June 14, 2023 5:05 PM Tomas Vondra <tomas.von...@enterprisedb.com> wrote: > On 6/14/23 05:15, Zhijie Hou (Fujitsu) wrote: > > On Tuesday, June 13, 2023 12:19 PM Zhijie Hou (Fujitsu) > <houzj.f...@fujitsu.com> wrote: > >> > >> On Tuesday, June 13, 2023 12:04 PM Amit Kapila > >> <amit.kapil...@gmail.com> > >> wrote: > >>> > >>> On Wed, Jun 7, 2023 at 6:02 PM Tomas Vondra > >>> <tomas.von...@enterprisedb.com> wrote: > >>>> > >>>> > >>>> Well, I think the issue is pretty clear - we end up with an initial > >>>> snapshot that's in between the ASSIGNMENT and NEW_CID, and because > >>>> NEW_CID has both xact and subxact XID it fails because we add two > >>>> TXNs with the same LSN, not realizing one of them is subxact. > >>>> > >>>> That's obviously wrong, although somewhat benign in production > >>>> because it only fails because of hitting an assert. > >>>> > >>> > >>> Doesn't this indicate that we can end up decoding a partial > >>> transaction when we restore a snapshot? Won't that be a problem even > >>> for > >> production? > >> > >> Yes, I think it can cause the problem that only partial changes of a > >> transaction are streamed. > >> I tried to reproduce this and here are the steps. Note, to make sure > >> the test won't be affected by other running_xact WALs, I changed > >> LOG_SNAPSHOT_INTERVAL_MS to a bigger number. > >> > >> session 1: > >> ----- > >> create table test(a int); > >> SELECT 'init' FROM > >> pg_create_logical_replication_slot('isolation_slot_1', > >> 'test_decoding'); > >> ----- > >> > >> session 2: > >> ----- > >> - Start a transaction > >> BEGIN; > >> INSERT INTO test VALUES(1); > >> ----- > >> > >> session 3: > >> ----- > >> - Create another slot isolation_slot_2, it should choose a > >> restart_point which is > >> - after the changes that happened in session 2. Note, to let the > >> current slot > >> - restore another snapshot, we need to use gdb to block the current > >> backend at > >> - SnapBuildFindSnapshot(), the backend should have logged the > >> running_xacts WAL > >> - before reaching SnapBuildFindSnapshot. > >> > >> SELECT 'init' FROM > >> pg_create_logical_replication_slot('isolation_slot_2', > >> 'test_decoding'); > >> ----- > >> > >> session 1: > >> ----- > >> - Since there is a running_xacts which session 3 logged, the current > >> backend will > >> - serialize the snapshot when decoding the running_xacts WAL, and the > >> snapshot > >> - can be used by other slots (e.g. isolation_slot_2) > >> > >> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_1', > >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); > >> ----- > >> > >> session 2: > >> ----- > >> - Insert some different data and commit the transaction. > >> > >> INSERT INTO test VALUES(2); > >> INSERT INTO test VALUES(3); > >> INSERT INTO test VALUES(4); > >> COMMIT > >> ----- > >> > >> session 3: > >> ----- > >> - Release the process and try to stream the changes, since the > >> restart point is > >> - at the middle of the transaction, it will stream partial changes of > >> the > >> - transaction which was committed in session 2: > >> > >> SELECT data FROM pg_logical_slot_get_changes('isolation_slot_2', > >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); > >> ----- > >> > >> Results (partial streamed changes): > >> postgres=# SELECT data FROM > >> pg_logical_slot_get_changes('isolation_slot_2', > >> NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0'); > >> data > >> ----------------------------------------- > >> BEGIN > >> table public.test: INSERT: a[integer]:2 table public.test: INSERT: > >> a[integer]:3 table public.test: INSERT: a[integer]:4 COMMIT > >> (5 rows) > >> > > > > One idea to fix the partial change stream problem would be that we > > record all the running transaction's xid when restoring the snapshot > > in SnapBuildFindSnapshot(), and in the following decoding, we skip > > decoding changes for the recorded transaction. Or we can do similar to > > 7f13ac8(serialize the information of running xacts if any) > > > > We need to think about how to fix this in backbranches, and the idea with > serializing running transactions seems rather unbackpatchable (as it changes > on-disk state). > > > But one point I am not very sure is that we might retore snapshot in > > SnapBuildSerializationPoint() as well where we don't have running > > transactions information. Although SnapBuildSerializationPoint() is > > invoked for XLOG_END_OF_RECOVERY and XLOG_CHECKPOINT_SHUTDOWN > records > > which seems no running transaction will be there when logging. But I > > am not 100% sure if the problem can happen in this case as well. > > ... > > Also, can you try if we still stream the partial transaction with > create_logical_replication_slot building a full snapshot?
Yes, It can fix this problem because force create_logical_replication_slot build a full snapshot can avoid restoring the snapshot. But I am not sure if this is the best fix for this for the same reason(it's costly) mentioned by Amit[1]. [1] https://www.postgresql.org/message-id/CAA4eK1Kro0rej%3DZXMhcdjs%2BaYsZvNywu3-cqdRUtyAp4zqpVWw%40mail.gmail.com Best Regards, Hou zj