On Thu, Mar 4, 2021 at 7:29 AM Robert Haas <robertmh...@gmail.com> wrote: > I think this whole approach is pretty suspect because the number of > blocks in the relation can increase (by relation extension) or > decrease (by VACUUM or TRUNCATE) between the time when we query for > the list of target relations and the time we get around to executing > any queries against them. I think it's OK to use the number of > relation pages for progress reporting because progress reporting is > only approximate anyway, but I wouldn't print them out in the progress > messages, and I wouldn't try to fix up the startblock and endblock > arguments on the basis of how long you think that relation is going to > be.
I don't think that the struct AmcheckOptions block fields (e.g., startblock) should be of type 'long' -- that doesn't work well on Windows, where 'long' is only 32-bit. To be fair we already do the same thing elsewhere, but there is no reason to repeat those mistakes. (I'm rather suspicious of 'long' in general.) I think that you could use BlockNumber + strtoul() without breaking Windows. > There are a LOT of things that can go wrong when we go try to run > verify_heapam on a table. The table might have been dropped; in fact, > on a busy production system, such cases are likely to occur routinely > if DDL is common, which for many users it is. The system catalog > entries might be screwed up, so that the relation can't be opened. > There might be an unreadable page in the relation, either because the > OS reports an I/O error or something like that, or because checksum > verification fails. There are various other possibilities. We > shouldn't view such errors as low-level things that occur only in > fringe cases; this is a corruption-checking tool, and we should expect > that running it against messed-up databases will be common. We > shouldn't try to interpret the errors we get or make any big decisions > about them, but we should have a clear way of reporting them so that > the user can decide what to do. I agree. Your database is not supposed to be corrupt. Once your database has become corrupt, all bets are off -- something happened that was supposed to be impossible -- which seems like a good reason to be modest about what we think we know. The user should always see the unvarnished truth. pg_amcheck should not presume to suppress errors from lower level code, except perhaps in well-scoped special cases. -- Peter Geoghegan