On 2025-Apr-15, Tender Wang wrote:

> I thought further about the lockmode calling find_inheritance_children
> in ATPrepAddPrimaryKey.
> What we do here? We first get oids of children, then check the if the
> column of children has marked not-null,  if not, report an error.
> No operation here on children.  I check other places that call
> find_inheritance_children, if we have operation on children, we usually pass
> Lockmode to find_inheritance_children, otherwise pass NoLock.

Hmm, I'm wary of doing this, although you're perhaps right that there's
no harm.  If we do need to add a not-null constraint on the children,
surely we'll acquire a stronger lock further down the execution chain.
In principle this sounds a good idea though.  (I'm not sure about doing
SearchSysCacheAttName() on a relation that might be dropped
concurrently; does dropping the child acquire lock on its parent?  I
suppose so, in which case this is okay; but still icky.  What about
DETACH CONCURRENTLY?)

However, I've also been looking at this and realized that this code can
have different structure which may allows us to skip the
find_inheritance_children() altogether.  The reason is that we already
scan the parent's list of columns searching for not-null constraints on
each of them; we only need to run this verification on children for
columns where there is none in the parent, and then only in the case
where recursion is turned off.

So I propose the attached patch, which also has some comments to
hopefully explain what is going on and why.  I ran Tom's test script a
few hundred times in a loop and I see no deadlock anymore.


Note that I also considered the idea of just not doing the check at all;
that is, if a child table doesn't have a not-null constraint, then let
  ALTER TABLE ONLY parent ADD PRIMARY KEY ( ... )
create the not-null constraint.  This works fine (it breaks one
regression test query though, would be easily fixed).  But I don't like
this very much, because it means the user could be surprised by the
lengthy unexpected runtime of creating the primary key, only to realize
that the server is checking the child table for nulls.  This is
especially bad if the user says ONLY.  I think it's better if they have
the chance to create the not-null constraint on their own volition.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
            https://twitter.com/thingskatedid/status/1456027786158776329
>From 8aacd6aa67ad6891aaea7a16b9304897798fef39 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvhe...@alvh.no-ip.org>
Date: Tue, 15 Apr 2025 20:38:54 +0200
Subject: [PATCH] Fix verification of not-null constraints on children during
 PK creation

---
 src/backend/commands/tablecmds.c | 80 +++++++++++++++++++-------------
 1 file changed, 49 insertions(+), 31 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index b3ed69457fc..03dd3754940 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -9440,6 +9440,15 @@ ATExecDropColumn(List **wqueue, Relation rel, const char *colName,
 /*
  * Prepare to add a primary key on table, by adding not-null constraints
  * on all columns.
+ *
+ * An important aspect of this is pg_dump creation of primary keys on
+ * partitioned tables, which is done by first creating the primary key
+ * constraint on the partitioned table itself as not-recursive (so that the
+ * creation of the PK itself doesn't recurse to the partitions), then creating
+ * the corresponding indexes on the partitions, then doing ALTER INDEX ATTACH
+ * PARTITION.  This maximizes parallelism.  However, it means that we must
+ * ensure the creation of not-null constraints on the partitions even if asked
+ * not to recurse.
  */
 static void
 ATPrepAddPrimaryKey(List **wqueue, Relation rel, AlterTableCmd *cmd,
@@ -9447,42 +9456,13 @@ ATPrepAddPrimaryKey(List **wqueue, Relation rel, AlterTableCmd *cmd,
 					AlterTableUtilityContext *context)
 {
 	Constraint *pkconstr;
+	List	   *children;
+	bool		got_children = false;
 
 	pkconstr = castNode(Constraint, cmd->def);
 	if (pkconstr->contype != CONSTR_PRIMARY)
 		return;
 
-	/*
-	 * If not recursing, we must ensure that all children have a NOT NULL
-	 * constraint on the columns, and error out if not.
-	 */
-	if (!recurse)
-	{
-		List	   *children;
-
-		children = find_inheritance_children(RelationGetRelid(rel),
-											 lockmode);
-		foreach_oid(childrelid, children)
-		{
-			foreach_node(String, attname, pkconstr->keys)
-			{
-				HeapTuple	tup;
-				Form_pg_attribute attrForm;
-
-				tup = SearchSysCacheAttName(childrelid, strVal(attname));
-				if (!tup)
-					elog(ERROR, "cache lookup failed for attribute %s of relation %u",
-						 strVal(attname), childrelid);
-				attrForm = (Form_pg_attribute) GETSTRUCT(tup);
-				if (!attrForm->attnotnull)
-					ereport(ERROR,
-							errmsg("column \"%s\" of table \"%s\" is not marked NOT NULL",
-								   strVal(attname), get_rel_name(childrelid)));
-				ReleaseSysCache(tup);
-			}
-		}
-	}
-
 	/* Verify that columns are not-null, or request that they be made so */
 	foreach_node(String, column, pkconstr->keys)
 	{
@@ -9528,12 +9508,50 @@ ATPrepAddPrimaryKey(List **wqueue, Relation rel, AlterTableCmd *cmd,
 			heap_freetuple(tuple);
 			continue;
 		}
+		else if (!recurse)
+		{
+			/*
+			 * If a not-null constraint for this column doesn't exist, and we
+			 * were asked not to recurse to children for the primary key, then
+			 * we must verify that the columns on children tables already have
+			 * a not-null constraint.  The reason for this, is that the
+			 * constraint is necessary so that our pg_dump strategy for
+			 * partitioned tables works, as explained above; and we don't want
+			 * such constraints to be created implicitly by the
+			 * makeNotNullConstraint() call below.  So here we check that a
+			 * not-null constraint exists on this column and raise an error if
+			 * not.  (We don't need to check on columns where a not-null
+			 * constraint exists on the parent, as we already verified that
+			 * it's not NO INHERIT.)
+			 */
+			if (!got_children)
+				children = find_inheritance_children(RelationGetRelid(rel),
+													 lockmode);
+
+			foreach_oid(childrelid, children)
+			{
+				HeapTuple	tup;
+				Form_pg_attribute attrForm;
+
+				tup = SearchSysCacheAttName(childrelid, strVal(column));
+				if (!tup)
+					elog(ERROR, "cache lookup failed for attribute %s of relation %u",
+						 strVal(column), childrelid);
+				attrForm = (Form_pg_attribute) GETSTRUCT(tup);
+				if (!attrForm->attnotnull)
+					ereport(ERROR,
+							errmsg("column \"%s\" of table \"%s\" is not marked NOT NULL",
+								   strVal(column), get_rel_name(childrelid)));
+				ReleaseSysCache(tup);
+			}
+		}
 
 		/* This column is not already not-null, so add it to the queue */
 		nnconstr = makeNotNullConstraint(column);
 
 		newcmd = makeNode(AlterTableCmd);
 		newcmd->subtype = AT_AddConstraint;
+		/* note we force recurse=true here; see above */
 		newcmd->recurse = true;
 		newcmd->def = (Node *) nnconstr;
 
-- 
2.39.5

Reply via email to