On Wed, Oct 19, 2022 at 11:58 AM Masahiko Sawada <sawada.m...@gmail.com> wrote:
>
> On Tue, Oct 18, 2022 at 9:53 PM Masahiko Sawada <sawada.m...@gmail.com> wrote:
> >
> > On Tue, Oct 18, 2022 at 7:49 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> > >
> > > On Tue, Oct 18, 2022 at 1:45 PM Masahiko Sawada <sawada.m...@gmail.com> 
> > > wrote:
> > > >
> > > > >
> > > > > I think because the test case proposed needs all three changes, we can
> > > > > push the change-1 without a test case and then as a second patch have
> > > > > change-2 for HEAD and change-3 for back branches with the test case.
> > > > > Do you have any other ideas to proceed here?
> > > >
> > > > I found another test case that causes the assertion failure at
> > > > "Assert(!needs_snapshot || needs_timetravel);" on all branches. I've
> > > > attached the patch for the test case. In this test case, I modified a
> > > > user-catalog table instead of system-catalog table. That way, we don't
> > > > generate invalidation messages while generating NEW_CID records. As a
> > > > result, we mark only the subtransactions as containing catalog change
> > > > and don't make association between top-level and sub transactions. The
> > > > assertion failure happens on all supported branches. If we need to fix
> > > > this (I believe so), Change-2 needs to be backpatched to all supported
> > > > branches.
> > > >
> > > > There are three changes as Amit mentioned, and regarding the test
> > > > case, we have three test cases I've attached: truncate_testcase.patch,
> > > > analyze_testcase.patch, uesr_catalog_testcase.patch. The relationship
> > > > between assertion failures and test cases are very complex. I could
> > > > not find any test case to cause only one assertion failure on all
> > > > branches. One idea to proceed is:
> > > >
> > > > Patch-1 includes Change-1 and is applied to all branches.
> > > >
> > > > Patch-2 includes Change-2 and the user_catalog test case, and is
> > > > applied to all branches.
> > > >
> > > > Patch-3 includes Change-3 and the truncate test case (or the analyze
> > > > test case), and is applied to v14 and v15 (also till v11 if we
> > > > prefer).
> > > >
> > > > The patch-1 doesn't include any test case but the user_catalog test
> > > > case can test both Change-1 and Change-2 on all branches.
> > > >
> > >
> > > I was wondering if it makes sense to commit both Change-1 and Change-2
> > > together as one patch? Both assertions are caused by a single test
> > > case and are related to the general problem that the association of
> > > top and sub transaction is only guaranteed to be formed before we
> > > decode transaction changes. Also, it would be good to fix the problem
> > > with a test case that can cause it. What do you think?
> >
> > Yeah, it makes sense to me.
> >
>
> I've attached two patches that need to be back-patched to all branches
> and includes Change-1, Change-2, and a test case for them. FYI this
> patch resolves the assertion failure reported in this thread as well
> as one reported in another thread[2]. So I borrowed some of the
> changes from the patch[2] Osumi-san recently proposed.
>

I've attached patches for Change-3 with a test case. Please review them as well.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From 763cb604fae97131b5be2da36deb10054b6dbf3a Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 19 Oct 2022 12:16:43 +0900
Subject: [PATCH v1] Don't assign subtransactions to top transaction in
 SnapBuildXidSetCatalogChanges().

Previously, when decoding the commit record of the transaction, we
mark both top-level transaction and its subtransactions as containing
catalog changes and assign the subtransactions to the top-level
transaction. However, if the subtransacitons have invalidation
messages, we missed executing them when forgetting the
transactions. In commit 272248a0c1 where we introduced
SnapBuildXidSetCatalogChanges(), the reason why we assigned them is
just to avoid the assertion failure in AssertTXNLsnOrder() as they
have the same LSN. Now that with commit XXX we skip this assertion
check until we reach the LSN at we start decoding the contents of the
transaciton, there is no reason for that anymore.

SnapBuildXidSetCatalogChanges() was introduced in 15 or older but this
bug doesn't exist in branches prior to 14 since we don't add
invalidation messages to subtransactions. We decided to backpatch
through 11 for consistency but don't for 10 since its final release
will come soon.

Reported-by: Osumi Takamichi
Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58660803BCAA7849C8584AA4F57E9%40TYAPR01MB5866.jpnprd01.prod.outlook.com
Backpatch: Backpatch-through: 11
---
 .../expected/catalog_change_snapshot.out      | 45 +++++++++++++++++++
 .../specs/catalog_change_snapshot.spec        |  8 ++++
 src/backend/replication/logical/snapbuild.c   |  7 +--
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index 1d75cf5af0..b33e49c0b1 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -87,3 +87,48 @@ COMMIT
 stop    
 (1 row)
 
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init    
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data                                                         
+-------------------------------------------------------------
+BEGIN                                                        
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)                      
+COMMIT                                                       
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data                                                         
+-------------------------------------------------------------
+BEGIN                                                        
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT                                                       
+(3 rows)
+
+?column?
+--------
+stop    
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index 2ad1edeaa8..b3ccb9e0e9 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -53,3 +53,11 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
 # transaction to do timetravel since one of its subtransactions has been marked as
 # containing catalog changes.
 permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_insert2" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+# The last decoding restarts from the first checkpoint, and add invalidation messages
+# generated by "s0_truncate" to the subtransaction. When decoding the commit record of
+# the top-level transaction, we mark both top-level transaction and its subtransactions
+# as containing catalog changes. However, we check if we don't create the association
+# between top-level and subtransactions at this time. Otherwise, we miss executing
+# invalidation messages when forgetting the transaction.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 1e7c918bde..14ce0d0523 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -2145,10 +2145,11 @@ SnapBuildXidSetCatalogChanges(SnapBuild *builder, TransactionId xid, int subxcnt
 	if (bsearch(&xid, InitialRunningXacts, NInitialRunningXacts,
 				sizeof(TransactionId), xidComparator) != NULL)
 	{
+		/*
+		 * We will assign subtransactions to the top transaction before
+		 * replaying the contents of the transaction.
+		 */
 		for (int i = 0; i < subxcnt; i++)
-		{
-			ReorderBufferAssignChild(builder->reorder, xid, subxacts[i], lsn);
 			ReorderBufferXidSetCatalogChanges(builder->reorder, subxacts[i], lsn);
-		}
 	}
 }
-- 
2.31.1

From f1953aaf460e6c80c5f7680a291917b7d3c20ba4 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 19 Oct 2022 12:16:43 +0900
Subject: [PATCH v1] Don't assign subtransactions to top transaction in
 SnapBuildXidSetCatalogChanges().

Previously, when decoding the commit record of the transaction, we
mark both top-level transaction and its subtransactions as containing
catalog changes and assign the subtransactions to the top-level
transaction. However, if the subtransacitons have invalidation
messages, we missed executing them when forgetting the
transactions. In commit 272248a0c1 where we introduced
SnapBuildXidSetCatalogChanges(), the reason why we assigned them is
just to avoid the assertion failure in AssertTXNLsnOrder() as they
have the same LSN. Now that with commit XXX we skip this assertion
check until we reach the LSN at we start decoding the contents of the
transaciton, there is no reason for that anymore.

SnapBuildXidSetCatalogChanges() was introduced in 15 or older but this
bug doesn't exist in branches prior to 14 since we don't add
invalidation messages to subtransactions. We decided to backpatch
through 11 for consistency but don't for 10 since its final release
will come soon.

Reported-by: Osumi Takamichi
Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58660803BCAA7849C8584AA4F57E9%40TYAPR01MB5866.jpnprd01.prod.outlook.com
Backpatch: Backpatch-through: 11
---
 .../expected/catalog_change_snapshot.out      | 45 +++++++++++++++++++
 .../specs/catalog_change_snapshot.spec        |  8 ++++
 src/backend/replication/logical/snapbuild.c   |  7 +--
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index 1d75cf5af0..b33e49c0b1 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -87,3 +87,48 @@ COMMIT
 stop    
 (1 row)
 
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init    
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data                                                         
+-------------------------------------------------------------
+BEGIN                                                        
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)                      
+COMMIT                                                       
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data                                                         
+-------------------------------------------------------------
+BEGIN                                                        
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT                                                       
+(3 rows)
+
+?column?
+--------
+stop    
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index 2ad1edeaa8..b3ccb9e0e9 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -53,3 +53,11 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
 # transaction to do timetravel since one of its subtransactions has been marked as
 # containing catalog changes.
 permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_insert2" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+# The last decoding restarts from the first checkpoint, and add invalidation messages
+# generated by "s0_truncate" to the subtransaction. When decoding the commit record of
+# the top-level transaction, we mark both top-level transaction and its subtransactions
+# as containing catalog changes. However, we check if we don't create the association
+# between top-level and subtransactions at this time. Otherwise, we miss executing
+# invalidation messages when forgetting the transaction.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index 80b7d86e79..3204e1e3a1 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -2119,10 +2119,11 @@ SnapBuildXidSetCatalogChanges(SnapBuild *builder, TransactionId xid, int subxcnt
 	{
 		ReorderBufferXidSetCatalogChanges(builder->reorder, xid, lsn);
 
+		/*
+		 * We will assign subtransactions to the top transaction before
+		 * replaying the contents of the transaction.
+		 */
 		for (int i = 0; i < subxcnt; i++)
-		{
-			ReorderBufferAssignChild(builder->reorder, xid, subxacts[i], lsn);
 			ReorderBufferXidSetCatalogChanges(builder->reorder, subxacts[i], lsn);
-		}
 	}
 }
-- 
2.31.1

From d60b190a7a64600bb952c42a3565973d96bcf456 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada <sawada.mshk@gmail.com>
Date: Wed, 19 Oct 2022 12:16:43 +0900
Subject: [PATCH v1] Don't assign subtransactions to top transaction in
 SnapBuildXidSetCatalogChanges().

Previously, when decoding the commit record of the transaction, we
mark both top-level transaction and its subtransactions as containing
catalog changes and assign the subtransactions to the top-level
transaction. However, if the subtransacitons have invalidation
messages, we missed executing them when forgetting the
transactions. In commit 272248a0c1 where we introduced
SnapBuildXidSetCatalogChanges(), the reason why we assigned them is
just to avoid the assertion failure in AssertTXNLsnOrder() as they
have the same LSN. Now that with commit XXX we skip this assertion
check until we reach the LSN at we start decoding the contents of the
transaciton, there is no reason for that anymore.

SnapBuildXidSetCatalogChanges() was introduced in 15 or older but this
bug doesn't exist in branches prior to 14 since we don't add
invalidation messages to subtransactions. We decided to backpatch
through 11 for consistency but don't for 10 since its final release
will come soon.

Reported-by: Osumi Takamichi
Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/TYAPR01MB58660803BCAA7849C8584AA4F57E9%40TYAPR01MB5866.jpnprd01.prod.outlook.com
Backpatch: Backpatch-through: 11
---
 .../expected/catalog_change_snapshot.out      | 45 +++++++++++++++++++
 .../specs/catalog_change_snapshot.spec        |  8 ++++
 src/backend/replication/logical/snapbuild.c   |  7 +--
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/contrib/test_decoding/expected/catalog_change_snapshot.out b/contrib/test_decoding/expected/catalog_change_snapshot.out
index 1d75cf5af0..b33e49c0b1 100644
--- a/contrib/test_decoding/expected/catalog_change_snapshot.out
+++ b/contrib/test_decoding/expected/catalog_change_snapshot.out
@@ -87,3 +87,48 @@ COMMIT
 stop    
 (1 row)
 
+
+starting permutation: s0_init s0_begin s0_savepoint s0_insert s1_checkpoint s1_get_changes s0_truncate s0_commit s0_begin s0_insert s1_checkpoint s1_get_changes s0_commit s1_get_changes
+step s0_init: SELECT 'init' FROM pg_create_logical_replication_slot('isolation_slot', 'test_decoding');
+?column?
+--------
+init    
+(1 row)
+
+step s0_begin: BEGIN;
+step s0_savepoint: SAVEPOINT sp1;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data
+----
+(0 rows)
+
+step s0_truncate: TRUNCATE tbl1;
+step s0_commit: COMMIT;
+step s0_begin: BEGIN;
+step s0_insert: INSERT INTO tbl1 VALUES (1);
+step s1_checkpoint: CHECKPOINT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data                                                         
+-------------------------------------------------------------
+BEGIN                                                        
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+table public.tbl1: TRUNCATE: (no-flags)                      
+COMMIT                                                       
+(4 rows)
+
+step s0_commit: COMMIT;
+step s1_get_changes: SELECT data FROM pg_logical_slot_get_changes('isolation_slot', NULL, NULL, 'skip-empty-xacts', '1', 'include-xids', '0');
+data                                                         
+-------------------------------------------------------------
+BEGIN                                                        
+table public.tbl1: INSERT: val1[integer]:1 val2[integer]:null
+COMMIT                                                       
+(3 rows)
+
+?column?
+--------
+stop    
+(1 row)
+
diff --git a/contrib/test_decoding/specs/catalog_change_snapshot.spec b/contrib/test_decoding/specs/catalog_change_snapshot.spec
index 2ad1edeaa8..b3ccb9e0e9 100644
--- a/contrib/test_decoding/specs/catalog_change_snapshot.spec
+++ b/contrib/test_decoding/specs/catalog_change_snapshot.spec
@@ -53,3 +53,11 @@ permutation "s0_init" "s0_begin" "s0_savepoint" "s0_truncate" "s1_checkpoint" "s
 # transaction to do timetravel since one of its subtransactions has been marked as
 # containing catalog changes.
 permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_insert2" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
+
+# The last decoding restarts from the first checkpoint, and add invalidation messages
+# generated by "s0_truncate" to the subtransaction. When decoding the commit record of
+# the top-level transaction, we mark both top-level transaction and its subtransactions
+# as containing catalog changes. However, we check if we don't create the association
+# between top-level and subtransactions at this time. Otherwise, we miss executing
+# invalidation messages when forgetting the transaction.
+permutation "s0_init" "s0_begin" "s0_savepoint" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_truncate" "s0_commit" "s0_begin" "s0_insert" "s1_checkpoint" "s1_get_changes" "s0_commit" "s1_get_changes"
diff --git a/src/backend/replication/logical/snapbuild.c b/src/backend/replication/logical/snapbuild.c
index dfe9d017f2..f3f5aba6a3 100644
--- a/src/backend/replication/logical/snapbuild.c
+++ b/src/backend/replication/logical/snapbuild.c
@@ -2113,10 +2113,11 @@ SnapBuildXidSetCatalogChanges(SnapBuild *builder, TransactionId xid, int subxcnt
 	{
 		int		i;
 
+		/*
+		 * We will assign subtransactions to the top transaction before
+		 * replaying the contents of the transaction.
+		 */
 		for (i = 0; i < subxcnt; i++)
-		{
-			ReorderBufferAssignChild(builder->reorder, xid, subxacts[i], lsn);
 			ReorderBufferXidSetCatalogChanges(builder->reorder, subxacts[i], lsn);
-		}
 	}
 }
-- 
2.31.1

Reply via email to