On Fri, Jan 10, 2020 at 11:56:37AM +0530, Mahendra Singh Thalor wrote: > I reviewed and tested the patch. After applying patch, I am getting other > assert failure. > > I think, before committing 1st patch, we should fix this crash also and > then we should commit all the patches.
I have somewhat managed to break my environment for a couple of days so as I got zero testing done with assertions, so I missed this one. Thanks for the lookup! The environment is fixed since. This code path uses an assertion that would become incorrect once you are able to create in pg_class temporary relations which rely on a temporary schema that does not exist anymore, because its schema has been dropped it, and that's what you are doing. The assertion does not concern only autovacuum originally, as it would fail each time a session tries to build a relation descriptor for its cache with a relation using a non-existing namespace. I have not really dug if that's actually possible to trigger.. Anyway. So, on the one hand, saying that we allow orphaned temporary relations to be dropped even if their schema does not exist is what autovacuum does now more aggressively, so that can help to avoid having to clean up yourself orphaned entries from catalogs, following up with their toast entries, etc. And this approach makes the assertion lose its meaning for autovacuum. On the other hand keeping this assertion makes sure that we never try to load incorrect relcache entries, and just make autovacuum less aggressive by ignoring orphaned entries with incorrect namespace references, though the user experience in fixing the cluster means manual manipulation of the catalogs. This is something I understood we'd like to avoid as much as possible, while keeping autovacuum aggressive on the removal as that can ease the life of people fixing a cluster. So this would bring us back to a point intermediate of 246a6c8. This makes me wonder how much we should try to outsmart somebody which puts the catalogs in such a inconsistent state. Hmm. Perhaps at the end autovacuum should just ignore such entries and just don't help the user at all as this also comes with its own issues with the storage level as well as smgr.c uses rd_backend. And if the user plays with temporary namespaces like that with superuser rights, he likely knows what he is doing. Perhaps not :D, in which case autovacuum may not be the best thing to decide that. I still think we should make the log of autovacuum.c for orphaned relations more careful with its coding though, and fix it with the previous patch. The documentation of isTempNamespaceInUse() could gain in clarity, just a nit from me while looking at the surroundings. And actually I found an issue with its logic, as the routine would not consider a temp namespace in use for a session's own MyBackendId. As that's only used for autovacuum, this has no consequence, but let's be correct in hte long run. And this gives the attached after a closer lookup. Thoughts? -- Michael
diff --git a/src/backend/catalog/namespace.c b/src/backend/catalog/namespace.c index c82f9fc4b5..0987986a8f 100644 --- a/src/backend/catalog/namespace.c +++ b/src/backend/catalog/namespace.c @@ -3235,10 +3235,14 @@ isTempNamespaceInUse(Oid namespaceId) backendId = GetTempNamespaceBackendId(namespaceId); - if (backendId == InvalidBackendId || - backendId == MyBackendId) + /* No such temporary namespace? */ + if (backendId == InvalidBackendId) return false; + /* Is the namespace used by this backend? */ + if (backendId == MyBackendId) + return true; + /* Is the backend alive? */ proc = BackendIdGetProc(backendId); if (proc == NULL) diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index f0e40e36af..d8702d650a 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2071,9 +2071,11 @@ do_autovacuum(void) { /* * We just ignore it if the owning backend is still active and - * using the temporary schema. + * using the temporary schema. If the namespace does not exist + * ignore the entry. */ - if (!isTempNamespaceInUse(classForm->relnamespace)) + if (!isTempNamespaceInUse(classForm->relnamespace) && + get_namespace_name(classForm->relnamespace) != NULL) { /* * The table seems to be orphaned -- although it might be that @@ -2202,6 +2204,7 @@ do_autovacuum(void) Oid relid = lfirst_oid(cell); Form_pg_class classForm; ObjectAddress object; + char *nspname; /* * Check for user-requested abort. @@ -2249,12 +2252,18 @@ do_autovacuum(void) continue; } - /* OK, let's delete it */ - ereport(LOG, - (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"", - get_database_name(MyDatabaseId), - get_namespace_name(classForm->relnamespace), - NameStr(classForm->relname)))); + nspname = get_namespace_name(classForm->relnamespace); + + if (nspname != NULL) + ereport(LOG, + (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"", + get_database_name(MyDatabaseId), + nspname, NameStr(classForm->relname)))); + else + ereport(LOG, + (errmsg("autovacuum: dropping orphan temp table \"%s\" with OID %u in database \"%s\"", + NameStr(classForm->relname), relid, + get_database_name(MyDatabaseId)))); object.classId = RelationRelationId; object.objectId = relid;
signature.asc
Description: PGP signature