On Wed, Sep 27, 2023 at 11:06:37AM +1300, Thomas Munro wrote: > I don't have an opinion yet on your other thread about making this > stuff configurable for replicas, but for the simple crash recovery > case shown here, hard failure makes sense to me.
Also, if we conclude that we're OK with just failing hard all the time for crash recovery and archive recovery on OOM, the other patch is not really required. That would be disruptive for standbys in some cases, still perhaps OK in the long-term. I am wondering if people have lost data because of this problem on production systems, actually.. It would not be possible to know that it happened until you see a page on disk that has a somewhat valid LSN, still an LSN older than the position currently being inserted, and that could show up in various forms. Even that could get hidden quickly if WAL is written at a fast pace after a crash recovery. A standby promotion at an LSN older would be unlikely as monitoring solutions discard standbys lagging behind N bytes. > *A more detailed analysis would talk about sectors (page header is > atomic), and consider whether we're only trying to defend ourselves > against recycled pages written by PostgreSQL (yes), arbitrary random > data (no, but it's probably still pretty good) or someone trying to > trick us (no, and we don't stand a chance). WAL would not be the only part of the system that would get borked if arbitrary bytes can be inserted into what's read from disk, random or not. -- Michael
signature.asc
Description: PGP signature