On Sun, Dec 29, 2019 at 07:37:15AM -0500, Robert Haas wrote: > I think this was a bad idea and that it should be reverted. It seems > to me that the problem here is that you introduced a feature which had > a bug, namely that it couldn't tolerate concurrency, and when somebody > discovered the bug, you "fixed" it not by making the code able to > tolerate concurrent activity but by preventing concurrent activity > from happening in the first place. I think that's wrong on general > principle.
Sorry for the delay, there was a long period off here so I could not have a serious look. The behavior of the code in 246a6c8 has changed so as a non-existing temporary namespace is considered as not in use, in which case autovacuum would consider this relation to be orphaned, and it would then try to drop it. Anyway, just a revert of the patch is not a good idea either, because keeping around the old behavior allows any user to create orphaned relations that autovacuum would just ignore in 9.4~10, leading to ultimately a forced shutdown of the instance as no cleanup can happen if this goes unnoticed. This also puts pg_class into an inconsistent state as pg_class entries would include references to a namespace that does not exist for sessions still holding its own references to myTempNamespace/myTempToastNamespace. > In this specific case, DROP SCHEMA on another temporary sessions's > schema is a feature which has existed for a very long time and which I > have used on multiple occasions to repair damaged databases. Suppose, > for example, there's a catalog entry that prevents the schema from > being dropped. Before this commit, you could fix it or delete the > entry and then retry the drop. Now, you can't. You can maybe wait for > autovacuum to retry it or something, assuming autovacuum is working > and you're in multi-user mode. This behavior is broken since its introduction then per the above. If we were to allow DROP SCHEMA to work properly on temporary schema, we would need to do more than what we have now, and that does not involve just mimicking DISCARD TEMP if you really wish to be able to drop the schema entirely and not only the objects it includes. Allowing a temporary schema to be dropped only if it is owned by the current session would be simple enough to implement, but I think that allowing that to work properly for a schema owned by another session would be rather difficult to implement for little gains. Now, if you still wish to be able to do a DROP SCHEMA on a temporary schema, I have no objections to allow doing that, but under some conditions. So I would recommend to restrict it so as this operation is not allowed by default, and I think we ought to use allow_system_table_mods to control that, because if you were to do that you are an operator and you know what you are doing. Normally :) > But even if that weren't the case, this seems like a very fragile fix. > Maybe someday we'll allow multiple autovacuum workers in the same > database, and the problem comes back. Maybe some user who can't drop > the schema because of this arbitrary prohibition will find themselves > forced to delete the pg_namespace row by hand and then crash the > server. Most server code is pretty careful that to either tolerate > missing system catalog tuples or elog(ERROR), not crash (e.g. cache > lookup failed for ...). This code shouldn't be an exception to that > rule. You are right here, things could be done better in 11 and newer versions, still there are multiple ways to do that. Here are three suggestions: 1) Issue an elog(ERROR) as that's what we do usually for lookup errors and such when seeing an orphaned relation which refers to a non-existing namespace. But this would prevent autovacuum to do any kind of work and just loop over-and-over on the same error, just bloating the database involved. 2) Ignore the relation and leave it around, though we really have been fighting to make autovacuum more aggressive, so that would defeat the work done lately for that purpose. 3) Still drop the orphaned relation even if it references to a non-existing schema, generating an appropriate LOG message so as the problem comes from an incorrect lookup at the namespace name. Attached is a patch doing two things: a) Control DROP SCHEMA on a temporary namespace using allow_system_table_mods. b) Generate a non-buggy LOG message if trying to remove a temp relation referring to a temporary schema that does not exist, using "(null)" as a replacement for the schema name. My suggestion is to do a) down to 9.4 if that's thought to be helpful to have, and at least Robert visibly thinks so, then b) in 11~ as that's where 246a6c8 exists. Comments welcome. -- Michael
diff --git a/src/backend/commands/dropcmds.c b/src/backend/commands/dropcmds.c index e7891a4418..725e395bbe 100644 --- a/src/backend/commands/dropcmds.c +++ b/src/backend/commands/dropcmds.c @@ -109,7 +109,8 @@ RemoveObjects(DropStmt *stmt) * result in inconsistencies within the session whose temporary schema * has been dropped. */ - if (stmt->removeType == OBJECT_SCHEMA && + if (!allowSystemTableMods && + stmt->removeType == OBJECT_SCHEMA && isAnyTempNamespace(address.objectId)) ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index f0e40e36af..c3bf02b040 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -2250,11 +2250,16 @@ do_autovacuum(void) } /* OK, let's delete it */ - ereport(LOG, - (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"", - get_database_name(MyDatabaseId), - get_namespace_name(classForm->relnamespace), - NameStr(classForm->relname)))); + if (log_min_messages <= LOG) + { + char *schemaname = get_namespace_name(classForm->relnamespace); + + ereport(LOG, + (errmsg("autovacuum: dropping orphan temp table \"%s.%s.%s\"", + get_database_name(MyDatabaseId), + schemaname != NULL ? schemaname : "(null)", + NameStr(classForm->relname)))); + } object.classId = RelationRelationId; object.objectId = relid;
signature.asc
Description: PGP signature