Hi Robert, On Tue, Dec 19, 2023 at 9:36 PM Robert Haas <robertmh...@gmail.com> wrote: > > On Fri, Dec 15, 2023 at 5:36 AM Jakub Wartak > <jakub.war...@enterprisedb.com> wrote: > > I've played with with initdb/pg_upgrade (17->17) and i don't get DBID > > mismatch (of course they do differ after initdb), but i get this > > instead: > > > > $ pg_basebackup -c fast -D /tmp/incr2.after.upgrade -p 5432 > > --incremental /tmp/incr1.before.upgrade/backup_manifest > > WARNING: aborting backup due to backend exiting before pg_backup_stop > > was called > > pg_basebackup: error: could not initiate base backup: ERROR: timeline > > 2 found in manifest, but not in this server's history > > pg_basebackup: removing data directory "/tmp/incr2.after.upgrade" > > > > Also in the manifest I don't see DBID ? > > Maybe it's a nuisance and all I'm trying to see is that if an > > automated cronjob with pg_basebackup --incremental hits a freshly > > upgraded cluster, that error message without errhint() is going to > > scare some Junior DBAs. > > Yeah. I think we should add the system identifier to the manifest, but > I think that should be left for a future project, as I don't think the > lack of it is a good reason to stop all progress here. When we have > that, we can give more reliable error messages about system mismatches > at an earlier stage. Unfortunately, I don't think that the timeline > messages you're seeing here are going to apply in every case: suppose > you have two unrelated servers that are both on timeline 1. I think > you could use a base backup from one of those servers and use it as > the basis for the incremental from the other, and I think that if you > did it right you might fail to hit any sanity check that would block > that. pg_combinebackup will realize there's a problem, because it has > the whole cluster to work with, not just the manifest, and will notice > the mismatching system identifiers, but that's kind of late to find > out that you made a big mistake. However, right now, it's the best we > can do. >
OK, understood. > > The incrementals are being generated , but just for the first (0) > > segment of the relation? > > I committed the first two patches from the series I posted yesterday. > The first should fix this, and the second relocates parse_manifest.c. > That patch hasn't changed in a while and seems unlikely to attract > major objections. There's no real reason to commit it until we're > ready to move forward with the main patches, but I think we're very > close to that now, so I did. > > Here's a rebase for cfbot. the v15 patchset (posted yesterday) test results are GOOD: 1. make check-world - GOOD 2. cfbot was GOOD 3. the devel/master bug present in parse_filename_for_nontemp_relation() seems to be gone (in local testing) 4. some further tests: test_across_wallevelminimal.sh - GOOD test_incr_after_timelineincrease.sh - GOOD test_incr_on_standby_after_promote.sh - GOOD test_many_incrementals_dbcreate.sh - GOOD test_many_incrementals.sh - GOOD test_multixact.sh - GOOD test_pending_2pc.sh - GOOD test_reindex_and_vacuum_full.sh - GOOD test_repro_assert.sh test_standby_incr_just_backup.sh - GOOD test_stuck_walsum.sh - GOOD test_truncaterollback.sh - GOOD test_unlogged_table.sh - GOOD test_full_pri__incr_stby__restore_on_pri.sh - GOOD test_full_pri__incr_stby__restore_on_stby.sh - GOOD test_full_stby__incr_stby__restore_on_pri.sh - GOOD test_full_stby__incr_stby__restore_on_stby.sh - GOOD 5. the more real-world pgbench test with localized segment writes usigng `\set aid random_exponential...` [1] indicates much greater efficiency in terms of backup space use now, du -sm shows: 210229 /backups/backups/full 250 /backups/backups/incr.1 255 /backups/backups/incr.2 [..] 348 /backups/backups/incr.13 408 /backups/backups/incr.14 // latest(20th of Dec on 10:40) 6673 /backups/archive/ The DB size was as reported by \l+ 205GB. That pgbench was running for ~27h (19th Dec 08:39 -> 20th Dec 11:30) with slow 100 TPS (-R), so no insane amounts of WAL. Time to reconstruct 14 chained incremental backups was 45mins (pg_combinebackup -o /var/lib/postgres/17/data /backups/backups/full /backups/backups/incr.1 (..) /backups/backups/incr.14). DB after recovering was OK and working fine. -J.