On Mon, Jun 10, 2019 at 5:35 AM Asim R P <aprav...@pivotal.io> wrote: > My understanding is smgr pendingDeletes infrastructure will be replaced by > these patches. I still see CommitTransaction() calling > smgrDoPendingDeletes() in the latest patch set. Am I missing something?
Hi Asim, Thanks for looking at the patch. The pendingDeletes list is used both for files that should be deleted if we commit and files that should be deleted if we abort. This patch deals only with the abort case, using the undo log instead of pendingDeletes. That is the file leak scenario that has an arbitrarily wide window controlled by the user and is probably the source of almost all cases that you hear of of disks filling up with orphaned junk AFAICS. There could in theory be a persistent stuff-to-do-if-we-commit system exactly unlike undo logs (records to be discarded on abort, executed on commit). I haven't thought much about how it'd work, but Andres did suggest something like that for another purpose just the other day, and although it's hard to think of a name for it, it doesn't seem crazy as long as it doesn't add overheads when you're not using it. Without such a mechanism, you can probably leak files belonging to tables that you have dropped in a committed transaction, if you die in CommitTransaction() after it has called RecordTransactionCommit() but before it reaches smgrDoPendingDeletes(), and even then probably only if there is super well-timed checkpoint so that you recover without replaying the drop. I'm not try to tackle that today. BTW, there is yet another kind of deferred unlinking going on. In SyncPostCheckpoint() (formerly known as mdpostckpt()) we defer the last bit of the job until after the next checkpoint. At that point we only expect the first segment to exist and we expect it to be empty. That's a mechanism introduced by commit 6cc4451b5c47 to make sure that we don't reuse relfilenode numbers too soon in some crash scenarios. That means there is another very narrow window there to leak a file (though these ones are empty): you die after the checkpoint is logged but before SyncPostCheckpoint() is run, or even after that but before the operating system has flushed the directory. -- Thomas Munro https://enterprisedb.com