Re: new heapcheck contrib module

Mark Dilger Thu, 30 Jul 2020 18:39:14 -0700

> On Jul 30, 2020, at 5:53 PM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Thu, Jul 30, 2020 at 6:10 PM Mark Dilger
> <mark.dil...@enterprisedb.com> wrote:
>> No, that wasn't my concern.  I was thinking about CLOG entries disappearing 
>> during the scan as a consequence of concurrent vacuums, and the effect that 
>> would have on the validity of the cached [relfrozenxid..next_valid_xid] 
>> range.  In the absence of corruption, I don't immediately see how this would 
>> cause any problems.  But for a corrupt table, I'm less certain how it would 
>> play out.
> 
> Oh, hmm. I wasn't thinking about that problem. I think the only way
> this can happen is if we read a page and then, before we try to look
> up the CID, vacuum zooms past, finishes the whole table, and truncates
> clog. But if that's possible, then it seems like it would be an issue
> for SELECT as well, and it apparently isn't, or we would've done
> something about it by now. I think the reason it's not possible is
> because of the locking rules described in
> src/backend/storage/buffer/README, which require that you hold a
> buffer lock until you've determined that the tuple is visible. Since
> you hold a share lock on the buffer, a VACUUM that hasn't already
> processed that freeze the tuples in that buffer; it would need an
> exclusive lock on the buffer to do that. Therefore it can't finish and
> truncate clog either.
> 
> Now, you raise the question of whether this is still true if the table
> is corrupt, but I don't really see why that makes any difference.
> VACUUM is supposed to freeze each page it encounters, to the extent
> that such freezing is necessary, and with Andres's changes, it's
> supposed to ERROR out if things are messed up. We can postulate a bug
> in that logic, but inserting a VACUUM-blocking lock into this tool to
> guard against a hypothetical vacuum bug seems strange to me. Why would
> the right solution not be to fix such a bug if and when we find that
> there is one?

Since I can't think of a plausible concrete example of corruption which would 
elicit the problem I was worrying about, I'll withdraw the argument.  But that 
leaves me wondering about a comment that Andres made upthread:

> On Apr 20, 2020, at 12:42 PM, Andres Freund <and...@anarazel.de> wrote:

> I don't think random interspersed uses of CLogTruncationLock are a good
> idea. If you move to only checking visibility after tuple fits into
> [relfrozenxid, nextXid), then you don't need to take any locks here, as
> long as a lock against vacuum is taken (which I think this should do
> anyway).

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Re: new heapcheck contrib module

Reply via email to