Hello, Andres Freund wrote:
> On 2017-12-06 17:23:55 -0300, Alvaro Herrera wrote: > > > I've played around quite some with the attached patch. So far, after > > > applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make > > > the situation worse for already existing corruption. HOT pruning can > > > change the exact appearance of existing corruption a bit, but I don't > > > think it can make the corruption meaningfully worse. It's a bit > > > annoying and scary to add so many checks to backbranches but it kinda > > > seems required. The error message texts aren't perfect, but these are > > > "should never be hit" type elog()s so I'm not too worried about that. > > > > Looking at 0002: I agree with the stuff being done here. I think a > > couple of these checks could be moved one block outerwards in term of > > scope; I don't see any reason why the check should not apply in that > > case. I didn't catch any place missing additional checks. > > I think I largely put them into the inner blocks because they were > guaranteed to be reached in those case (the horizon has to be before the > cutoff etc), and that way additional branches are avoided. Hmm, it should be possible to call vacuum with a very low freeze_min_age (which sets a very recent relfrozenxid), then shortly thereafter call it with a large one, no? So it's not really guaranteed ... > > Despite these being "shouldn't happen" conditions, I think we should > > turn these up all the way to ereports with an errcode and all, and also > > report the XIDs being complained about. No translation required, > > though. Other than those changes and minor copy editing a commit > > (attached), 0002 looks good to me. > > Hm, I don't really care one way or another. I do see that you used > errmsg() in some places, errmsg_internal() in others. Was that > intentional? Eh, no, my intention was to make these all errmsg_internal() to avoid translation (serves no purpose here). Feel free to update the remaining ones. > > I started thinking it'd be good to report block number whenever anything > > happened while scanning the relation. The best way to go about this > > seems to be to add an errcontext callback to lazy_scan_heap, so I attach > > a WIP untested patch to add that. (I'm not proposing this for > > back-patch for now, mostly because I don't have the time/energy to push > > for it right now.) > > That seems like a good idea. There's some cases where that could > increase log spam noticeably (unitialized blocks), but that seems > acceptable. Yeah, I noticed that and I agree it seems ok. > > + if (info->blkno != InvalidBlockNumber) > > + errcontext("while scanning page %u of relation %s", > > + info->blkno, > > RelationGetRelationName(info->relation)); > > + else > > + errcontext("while vacuuming relation %s", > > + RelationGetRelationName(info->relation)); > > Hm, perhaps rephrase so both messages refer to vacuuming? E.g. just by > replacing scanning with vacuuming? Makes sense. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services