On Fri, Jan 10, 2020 at 11:56:37AM +0530, Mahendra Singh Thalor wrote:
> I reviewed and tested the patch. After applying patch, I am getting other
> assert failure.
>
> I think, before committing 1st patch, we should fix this crash also and
> then we should commit all the patches.

I have somewhat managed to break my environment for a couple of days
so as I got zero testing done with assertions, so I missed this one.
Thanks for the lookup!  The environment is fixed since.

This code path uses an assertion that would become incorrect once you
are able to create in pg_class temporary relations which rely on a
temporary schema that does not exist anymore, because its schema has
been dropped it, and that's what you are doing.  The assertion does
not concern only autovacuum originally, as it would fail each time a
session tries to build a relation descriptor for its cache with a
relation using a non-existing namespace.  I have not really dug if
that's actually possible to trigger..  Anyway.

So, on the one hand, saying that we allow orphaned temporary relations
to be dropped even if their schema does not exist is what autovacuum
does now more aggressively, so that can help to avoid having to clean
up yourself orphaned entries from catalogs, following up with their
toast entries, etc.  And this approach makes the assertion lose its
meaning for autovacuum.

On the other hand keeping this assertion makes sure that we never try
to load incorrect relcache entries, and just make autovacuum less
aggressive by ignoring orphaned entries with incorrect namespace
references, though the user experience in fixing the cluster means
manual manipulation of the catalogs.  This is something I understood
we'd like to avoid as much as possible, while keeping autovacuum
aggressive on the removal as that can ease the life of people fixing a
cluster.  So this would bring us back to a point intermediate of
246a6c8.

This makes me wonder how much we should try to outsmart somebody which
puts the catalogs in such a inconsistent state.  Hmm.  Perhaps at the
end autovacuum should just ignore such entries and just don't help the
user at all as this also comes with its own issues with the storage
level as well as smgr.c uses rd_backend.  And if the user plays with
temporary namespaces like that with superuser rights, he likely knows
what he is doing.  Perhaps not :D, in which case autovacuum may not be
the best thing to decide that.  I still think we should make the log
of autovacuum.c for orphaned relations more careful with its coding
though, and fix it with the previous patch.  The documentation of
isTempNamespaceInUse() could gain in clarity, just a nit from me while
looking at the surroundings.  And actually I found an issue with its
logic, as the routine would not consider a temp namespace in use for a
session's own MyBackendId.  As that's only used for autovacuum, this
has no consequence, but let's be correct in hte long run.

And this gives the attached after a closer lookup.  Thoughts?
--
Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c
index c82f9fc4b5..0987986a8f 100644
--- a/src/backend/catalog/namespace.c
+++ b/src/backend/catalog/namespace.c
@@ -3235,10 +3235,14 @@ isTempNamespaceInUse(Oid namespaceId)
 
 	backendId = GetTempNamespaceBackendId(namespaceId);
 
-	if (backendId == InvalidBackendId ||
-		backendId == MyBackendId)
+	/* No such temporary namespace? */
+	if (backendId == InvalidBackendId)
 		return false;
 
+	/* Is the namespace used by this backend? */
+	if (backendId == MyBackendId)
+		return true;
+
 	/* Is the backend alive? */
 	proc = BackendIdGetProc(backendId);
 	if (proc == NULL)
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index f0e40e36af..d8702d650a 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -2071,9 +2071,11 @@ do_autovacuum(void)
 		{
 			/*
 			 * We just ignore it if the owning backend is still active and
-			 * using the temporary schema.
+			 * using the temporary schema.  If the namespace does not exist
+			 * ignore the entry.
 			 */
-			if (!isTempNamespaceInUse(classForm->relnamespace))
+			if (!isTempNamespaceInUse(classForm->relnamespace) &&
+				get_namespace_name(classForm->relnamespace) != NULL)
 			{
 				/*
 				 * The table seems to be orphaned -- although it might be that
@@ -2202,6 +2204,7 @@ do_autovacuum(void)
 		Oid			relid = lfirst_oid(cell);
 		Form_pg_class classForm;
 		ObjectAddress object;
+		char	   *nspname;
 
 		/*
 		 * Check for user-requested abort.
@@ -2249,12 +2252,18 @@ do_autovacuum(void)
 			continue;
 		}
 
-		/* OK, let's delete it */
-		ereport(LOG,
-				(errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
-						get_database_name(MyDatabaseId),
-						get_namespace_name(classForm->relnamespace),
-						NameStr(classForm->relname))));
+		nspname = get_namespace_name(classForm->relnamespace);
+
+		if (nspname != NULL)
+			ereport(LOG,
+					(errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"",
+							get_database_name(MyDatabaseId),
+							nspname, NameStr(classForm->relname))));
+		else
+			ereport(LOG,
+					(errmsg("autovacuum: dropping orphan temp table \"%s\" with OID %u in database \"%s\"",
+							NameStr(classForm->relname), relid,
+							get_database_name(MyDatabaseId))));
 
 		object.classId = RelationRelationId;
 		object.objectId = relid;

Attachment: signature.asc
Description: PGP signature

Reply via email to