On Tue, Aug 6, 2024 at 1:49 AM Michael Paquier <mich...@paquier.xyz> wrote: > dikkop has reported a failure with the regression tests of pg_combinebackup: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2024-08-04%2010%3A04%3A51 > > That's in the test 003_timeline.pl, from dc212340058b: > # Failed test 'incremental backup from node1' > # at t/003_timeline.pl line 43. > > The node is extremely slow, so perhaps bumping up the timeout would be > fine enough in this case (did not spend time analyzing it). I don't > think that this has been discussed, but perhaps I just missed a > reference to it and the incremental backup thread is quite large.
I just noticed, rather belatedly, that this thread is on the open items list. This seems to be the cause of the failure: 2024-08-04 12:46:34.986 UTC [4951:15] 003_timeline.pl STATEMENT: START_REPLICATION SLOT "pg_basebackup_4951" 0/4000000 TIMELINE 1 2024-08-04 12:47:34.987 UTC [4951:16] 003_timeline.pl LOG: terminating walsender process due to replication timeout wal_sender_timeout is 60s by default, so that tracks. The command that provokes this failure is: pg_basebackup -D /mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/bin/pg_combinebackup/tmp_check/t_003_timeline_node1_data/backup/backup2 --no-sync -cfast --incremental /mnt/data/buildfarm/buildroot/HEAD/pgsql.build/src/bin/pg_combinebackup/tmp_check/t_003_timeline_node1_data/backup/backup1/backup_manifest All we're doing here is taking an incremental backup of 1-table database that had 1 row at the time of the full backup and has had 1 more row inserted since then. On my system, the last time I ran this regression test, this step completed in 410ms. It shouldn't be expensive. So I'm inclined to chalk this up to the machine not having enough resources. The only thing that I don't really understand is why this particular test would fail vs. anything else. We have a bunch of tests that take backups. A possibly important difference here is that this one is an incremental backup, so it would need to read WAL summary files from the beginning of the full backup to the beginning of the current backup and combine them into one super-summary that it could then use to decide what to include in the incremental backup. However, since this is an artificial example with just 1 insert between the full and the incremental, it's hard to imagine that being expensive, unless there's some low-probability bug that makes it go into an infinite loop or chew up a million CPU cycles or something. That's not impossible, but given the discussion between you and Tomas, I'm kinda hoping it was just a hardware issue. Barring objections or other similar trouble reports, I think we should just close out this open item. -- Robert Haas EDB: http://www.enterprisedb.com