On Thu, Mar 4, 2021 at 7:29 AM Robert Haas <robertmh...@gmail.com> wrote:
> I think this whole approach is pretty suspect because the number of
> blocks in the relation can increase (by relation extension) or
> decrease (by VACUUM or TRUNCATE) between the time when we query for
> the list of target relations and the time we get around to executing
> any queries against them. I think it's OK to use the number of
> relation pages for progress reporting because progress reporting is
> only approximate anyway, but I wouldn't print them out in the progress
> messages, and I wouldn't try to fix up the startblock and endblock
> arguments on the basis of how long you think that relation is going to
> be.

I don't think that the struct AmcheckOptions block fields (e.g.,
startblock) should be of type 'long' -- that doesn't work well on
Windows, where 'long' is only 32-bit. To be fair we already do the
same thing elsewhere, but there is no reason to repeat those mistakes.
(I'm rather suspicious of 'long' in general.)

I think that you could use BlockNumber + strtoul() without breaking Windows.

> There are a LOT of things that can go wrong when we go try to run
> verify_heapam on a table. The table might have been dropped; in fact,
> on a busy production system, such cases are likely to occur routinely
> if DDL is common, which for many users it is. The system catalog
> entries might be screwed up, so that the relation can't be opened.
> There might be an unreadable page in the relation, either because the
> OS reports an I/O error or something like that, or because checksum
> verification fails. There are various other possibilities. We
> shouldn't view such errors as low-level things that occur only in
> fringe cases; this is a corruption-checking tool, and we should expect
> that running it against messed-up databases will be common. We
> shouldn't try to interpret the errors we get or make any big decisions
> about them, but we should have a clear way of reporting them so that
> the user can decide what to do.

I agree.

Your database is not supposed to be corrupt. Once your database has
become corrupt, all bets are off -- something happened that was
supposed to be impossible -- which seems like a good reason to be
modest about what we think we know.

The user should always see the unvarnished truth. pg_amcheck should
not presume to suppress errors from lower level code, except perhaps
in well-scoped special cases.

-- 
Peter Geoghegan


Reply via email to