Hi, While working on benchmarks for the syscache patch (negative entries and all of that), I've ran into a an issue in remove_from_unowned_list.
If you create a lot of relations in a single transaction (say, 100k) and then abort the transaction (or if it fails for some reason, e.g. because of hitting max_locks_per_transaction), smgrDoPendingDeletes ought to clean relation files. Unfortunately, the constructed srels array is ordered in exactly the opposite way compared to "unowned list" (used by smgrclose). So what happens is that we start with the first relation, walk the whole list unowned list and remove the last item. Then we get the second rel and again walk the whole unowned list until the last item. And so on. For large number of objects that's pretty significant. With 100k rels I've lost the patience after a couple of minutes, and just nuked the whole cluster (interrupting ROLLBACK is a no go, and after a kill the recovery just starts chewing on it again). Of course, transactions creating 100k tables in a single transaction are not that common, but it's not unheard of either (such applications exist, are discussed in the syscache thread, and restoring them from a pg_dump is likely to hit this issue). And the same issue applies to cases that drop a bunch of tables in a single transaction (and I've seen plenty of scripts doing that, although in smaller batches). But it's a bit funnier, because there's also DropRelationFiles() which does smgrclose on a batch of relations too, and it says this /* * Call smgrclose() in reverse order as when smgropen() is called. * This trick enables remove_from_unowned_list() in smgrclose() * to search the SMgrRelation from the unowned list, * with O(1) performance. */ for (i = ndelrels - 1; i >= 0; i--) ... but it's called from two places in xact.c only. And if you trigger the issue with 100k x CREATE TABLE + ROLLBACK, and then force a recovery by killing postmaster, you actually get the very same behavior with always traversing the unowned list for some reason. I'm not quite sure why, but it seems the optimization is exactly the wrong thing to do here. Attached is a WIP patch removing the optimization from DropRelationFiles and adding it to smgrDoPendingDeletes. This resolves the issue, at least in the cases I've been able to reproduce. But maybe we should deal with this issue earlier by ensuring the two lists are ordered the same way from the very beginning, somehow. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/catalog/storage.c b/src/backend/catalog/storage.c index 0302507e6f..8dc9050343 100644 --- a/src/backend/catalog/storage.c +++ b/src/backend/catalog/storage.c @@ -360,7 +360,13 @@ smgrDoPendingDeletes(bool isCommit) { smgrdounlinkall(srels, nrels, false); - for (i = 0; i < nrels; i++) + /* + * Call smgrclose() in reverse order as when smgropen() is called. + * This trick enables remove_from_unowned_list() in smgrclose() + * to search the SMgrRelation from the unowned list, + * with O(1) performance. + */ + for (i = nrels-1; i >= 0; i--) smgrclose(srels[i]); pfree(srels); diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c index 2aba2dfe91..6ed68185ed 100644 --- a/src/backend/storage/smgr/md.c +++ b/src/backend/storage/smgr/md.c @@ -1699,13 +1699,7 @@ DropRelationFiles(RelFileNode *delrels, int ndelrels, bool isRedo) smgrdounlinkall(srels, ndelrels, isRedo); - /* - * Call smgrclose() in reverse order as when smgropen() is called. - * This trick enables remove_from_unowned_list() in smgrclose() - * to search the SMgrRelation from the unowned list, - * with O(1) performance. - */ - for (i = ndelrels - 1; i >= 0; i--) + for (i = 0; i < ndelrels; i++) smgrclose(srels[i]); pfree(srels); }