Hi Robert, On Mon, Dec 11, 2023 at 6:08 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Fri, Dec 8, 2023 at 5:02 AM Jakub Wartak > <jakub.war...@enterprisedb.com> wrote: > > While we are at it, maybe around the below in PrepareForIncrementalBackup() > > > > if (tlep[i] == NULL) > > ereport(ERROR, > > > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > errmsg("timeline %u found in > > manifest, but not in this server's history", > > range->tli))); > > > > we could add > > > > errhint("You might need to start a new full backup instead of > > incremental one") > > > > ? > > I can't exactly say that such a hint would be inaccurate, but I think > the impulse to add it here is misguided. One of my design goals for > this system is to make it so that you never have to take a new > incremental backup "just because,"
Did you mean take a new full backup here? > not even in case of an intervening > timeline switch. So, all of the errors in this function are warning > you that you've done something that you really should not have done. > In this particular case, you've either (1) manually removed the > timeline history file, and not just any timeline history file but the > one for a timeline for a backup that you still intend to use as the > basis for taking an incremental backup or (2) tried to use a full > backup taken from one server as the basis for an incremental backup on > a completely different server that happens to share the same system > identifier, e.g. because you promoted two standbys derived from the > same original primary and then tried to use a full backup taken on one > as the basis for an incremental backup taken on the other. > Okay, but please consider two other possibilities: (3) I had a corrupted DB where I've fixed it by running pg_resetwal and some cronjob just a day later attempted to take incremental and failed with that error. (4) I had pg_upgraded (which calls pg_resetwal on fresh initdb directory) the DB where I had cronjob that just failed with this error I bet that (4) is going to happen more often than (1), (2) , which might trigger users to complain on forums, support tickets. > > > I have a fix for this locally, but I'm going to hold off on publishing > > > a new version until either there's a few more things I can address all > > > at once, or until Thomas commits the ubsan fix. > > > > > > > Great, I cannot get it to fail again today, it had to be some dirty > > state of the testing env. BTW: Thomas has pushed that ubsan fix. > > Huzzah, the cfbot likes the patch set now. Here's a new version with > the promised fix for your non-reproducible issue. Let's see whether > you and cfbot still like this version. LGTM, all quick tests work from my end too. BTW: I have also scheduled the long/large pgbench -s 14000 (~200GB?) - multiple day incremental test. I'll let you know how it went. -J.