Hi, Joerg reported on IRC that after a DROP DATABASE the space of the dropped database wasn't available to the OS until he killed a session that was active from before the drop. I was kinda incredulous at first, the code looks sane... Thomas figured out significant parts of this.
dropdb() does a fairly careful dance to ensure processes close FDs for about-to-be-removed files: /* * Drop pages for this database that are in the shared buffer cache. This * is important to ensure that no remaining backend tries to write out a * dirty buffer to the dead database later... */ DropDatabaseBuffers(db_id); /* * Tell the stats collector to forget it immediately, too. */ pgstat_drop_database(db_id); /* * Tell checkpointer to forget any pending fsync and unlink requests for * files in the database; else the fsyncs will fail at next checkpoint, or * worse, it will delete files that belong to a newly created database * with the same OID. */ ForgetDatabaseFsyncRequests(db_id); /* * Force a checkpoint to make sure the checkpointer has received the * message sent by ForgetDatabaseFsyncRequests. On Windows, this also * ensures that background procs don't hold any open files, which would * cause rmdir() to fail. */ RequestCheckpoint(CHECKPOINT_IMMEDIATE | CHECKPOINT_FORCE | CHECKPOINT_WAIT); /* * Remove all tablespace subdirs belonging to the database. */ remove_dbtablespaces(db_id); By my code reading that ensures that FDs held open by checkpointer and bgwriter are closed in a relatively timely fashion (although I don't think there's a nice time-bound guarantee for bgwriter, which could be problematic for windows). That unfortunately disregards that normal backends could have plenty files open for the to-be-dropped database, if there's any sort of shared_buffer pressure. Whenever a backend writes out a dirty victim page belonging to another database, it'll also open files therein. I thought that'd not be a problem because surely we do a smgrclose() at some proper point to avoid that issue. But we don't. I don't quite know how to trivially [1] guarantee that an invalidation we send out is processed by all backends before continuing. But even if we just send out an invalidation to all backends that instruct them to do an smgrcloseall() the next time they do invalidation processing, we'd be much better off than what are now (where we'll never close those files until there's a lot of fd pressure or the backend exits). I wonder if this is responsible for some issues we had with windows in the past... [1] I think we need something like an acknowledge protocol for a few things, most recently onlining enabling checksums anyway. What I had in mind is simple: 1) one global 64bit request counter in shared memory 2) per-backend counter in shared memory, that acknowledge up to which point requests have been processed. 3) per-purpose procsignal that request e.g. that checksums get enabled, that smgrcloseall has been processed, etc. 4) Function that sends out such a signal to all backends and then waits till all such requests have been processed. This obviously shouldn't be used for very frequent occurances, but the use-cases I have in mind should never be that. Greetings, Andres Freund