Hi Melanie > relallvisible. It seems like we should make it consistent. Perhaps we > should just remove it from heap_vacuum_rel(). Then add an assert in > all these places to at least protect development mistakes. I think there's some objection to that.
Thanks On Tue, Feb 25, 2025 at 7:35 AM Melanie Plageman <melanieplage...@gmail.com> wrote: > On Mon, Feb 24, 2025 at 4:53 PM Nathan Bossart <nathandboss...@gmail.com> > wrote: > > > > On Thu, Feb 20, 2025 at 07:35:32PM -0500, Melanie Plageman wrote: > > > Attache v7 doesn't cap the result for manual stats updating done with > > > relation_statistics_update(). > > > > Unfortunately, this already needs a rebase... > > Thanks! Attached v8 does just that. > > I've also taken a pass at updating the stats import tests to include > relallfrozen. I'm not totally sure how I feel about the results. I > erred on the side of putting relallfrozen wherever relallvisible was. > But, that means relallfrozen is included in test cases where it > doesn't seem important to the test case. But, that was true with > relallvisible too. > > Anyway, I talked to Corey off-list last week. Maybe he will have a > chance to weigh in on the test cases. Also, I had forgotten to include > relallfrozen in pg_clear_relation_stats(). I've fixed that in v8, but > now I'm worried there are some other places I may have missed. > > > > I did, however, keep the cap for the > > > places where vacuum/analyze/create index update the stats. There the > > > number for relallfrozen is coming directly from visibilitymap_count(), > > > so it should be correct. I could perhaps add an assert instead, but I > > > didn't think that really made sense. An assert is meant to help the > > > developer and what could the developer do about the visibility map > > > being corrupted. > > > > This might just be personal preference, but I think this is exactly the > > sort of thing an assertion is meant for. If we expect the value to > always > > be correct, and it's not, then our assumptions were wrong or someone has > > broken something. In both of these cases, I as a developer would really > > like to know so that I don't introduce a latent bug. If we want Postgres > > to gracefully handle or detect visibility map corruption, then maybe we > > should do both or PANIC. > > I'm on the fence about adding a PANIC. We do PANIC in other places > where we notice corruption (like PageAddItemExtended()). But, in most > of the cases, it seems like we are PANICing because there isn't a > reasonable way to accomplish the intended task. In this case, we > probably can't trust the visibility map counts for that page, but the > pg_class columns are just estimates, so just capping relallfrozen > might be good enough. > > I will note that in the other place where we may notice corruption in > the VM, in lazy_scan_prune(), we do visibilitymap_clear() and print a > WARNING -- as opposed to PANICing. Perhaps that's because there is no > need to panic, since we are already fixing the problem with > visibiliytmap_clear(). > > An assert would only help if the developer did something while > developing that corrupted the visibility map. It doesn't help keep > bogus values out of pg_class if a user's visibility map got corrupted > in some way. But, maybe that isn't needed. > > > I do see that heap_vacuum_rel() already caps relallvisible to relpages, > but > > it's not clear to me whether that's something that we expect to regularly > > happen in practice or what the adverse effects might be. So perhaps I'm > > misunderstanding the scope and severity of bogus results from > > visibilitymap_count(). Commit e6858e6, which added this code, doesn't > say > > anything about safety, but commit 3d351d9 changed the comment in question > > to its current wording. After a very quick skim of the latter's thread > > [0], I don't see any discussion about this point, either. > > Thanks for doing this archaeology. > > I am hesitant to keep the current cap on relallvisible in > heap_vacuum_rel() but then not include an equivalent cap for > relallfrozen. And I think it would be even more confusing for > relallvisible to be capped but relallfrozen has an assert instead. > > None of the other locations where relallvisible is updated > (do_analyze_rel(), index_update_stats()) do this capping of > relallvisible. It seems like we should make it consistent. Perhaps we > should just remove it from heap_vacuum_rel(). Then add an assert in > all these places to at least protect development mistakes. > > - Melanie >