Re: Accidental removal of a file causing various problems

Tom Lane Fri, 24 Aug 2018 12:45:59 -0700

Alvaro Herrera <alvhe...@2ndquadrant.com> writes:
> On 2018-Aug-25, Pavan Deolasee wrote:
>> Now of course, the file is really missing. But the user was quite surprised
>> that they couldn't connect to any database, even though mishap happened to
>> a user table in one of their reporting databases.


> Hmm, that sounds like there's a bunch of dirty pages waiting to be
> written to that nonexistant file, and the error prevents the starting
> backend from acquiring a free page on which to read something from disk
> for another relation.

Perhaps so --- but wouldn't this require that every buffer in shared
buffers now belong to the corrupted file?  Or have we broken the
allocation algorithm such that the same buffer keeps getting handed
out to every request?

I'm starting to wonder if this type of scenario needs to be considered
alongside the truncation corruption issues we're discussing nearby.
What do you do given a persistent failure to write a dirty block?
It's hard to see how you get to an answer that doesn't result in
(a) corrupted data or (b) a stuck database, neither of which is
pleasant.  But I think right now our behavior will lead to (b),
which is what this is reporting --- until you do stop -m immediate,
and then likely you've got (a).

                        regards, tom lane

Re: Accidental removal of a file causing various problems

Reply via email to