pá 17. 5. 2024 v 18:02 odesílatel Peter Geoghegan <p...@bowt.ie> napsal:

> On Fri, May 17, 2024 at 9:13 AM Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
> > after migration on PostgreSQL 16 I seen 3x times (about every week)
> broken tables on replica nodes. The query fails with error
> >
> > ERROR:  could not access status of transaction 1442871302
> > DETAIL:  Could not open file "pg_xact/0560": No such file or directory
>
> You've shown an inconsistency between the primary and standby with
> respect to the heap tuple infomask bits related to freezing. It looks
> like a FREEZE WAL record from the primary was never replayed on the
> standby.
>

It think is possible so broken tuples was created before upgrade from
Postgres 15 to Postgres 16 - not too far before, so this bug can be side
effect of upgrade



>
> It's natural for me to wonder if my Postgres 16 work on page-level
> freezing might be a factor here. If that really was true, then it
> would be necessary to explain why the primary and standby are
> inconsistent (no reason to suspect a problem on the primary here).
> It'd have to be the kind of issue that could be detected mechanically
> using wal_consistency_checking, but wasn't detected that way before
> now -- that seems unlikely.
>
> It's worth considering if the more aggressive behavior around
> relfrozenxid advancement (in 15) and freezing (in 16) has increased
> the likelihood of problems like these in setups that were already
> faulty, in whatever way. The standby database is indeed corrupt, but
> even on 16 it's fairly isolated corruption in practical terms. The
> full extent of the problem is clear once amcheck is run, but only one
> tuple can actually cause the system to error due to the influence of
> hint bits (for better or worse, hint bits mask the problem quite well,
> even on 16).
>
> --
> Peter Geoghegan
>

Reply via email to