pá 17. 5. 2024 v 18:02 odesílatel Peter Geoghegan <p...@bowt.ie> napsal:
> On Fri, May 17, 2024 at 9:13 AM Pavel Stehule <pavel.steh...@gmail.com> > wrote: > > after migration on PostgreSQL 16 I seen 3x times (about every week) > broken tables on replica nodes. The query fails with error > > > > ERROR: could not access status of transaction 1442871302 > > DETAIL: Could not open file "pg_xact/0560": No such file or directory > > You've shown an inconsistency between the primary and standby with > respect to the heap tuple infomask bits related to freezing. It looks > like a FREEZE WAL record from the primary was never replayed on the > standby. > It think is possible so broken tuples was created before upgrade from Postgres 15 to Postgres 16 - not too far before, so this bug can be side effect of upgrade > > It's natural for me to wonder if my Postgres 16 work on page-level > freezing might be a factor here. If that really was true, then it > would be necessary to explain why the primary and standby are > inconsistent (no reason to suspect a problem on the primary here). > It'd have to be the kind of issue that could be detected mechanically > using wal_consistency_checking, but wasn't detected that way before > now -- that seems unlikely. > > It's worth considering if the more aggressive behavior around > relfrozenxid advancement (in 15) and freezing (in 16) has increased > the likelihood of problems like these in setups that were already > faulty, in whatever way. The standby database is indeed corrupt, but > even on 16 it's fairly isolated corruption in practical terms. The > full extent of the problem is clear once amcheck is run, but only one > tuple can actually cause the system to error due to the influence of > hint bits (for better or worse, hint bits mask the problem quite well, > even on 16). > > -- > Peter Geoghegan >