Re: Regression in statement locations

2025-05-21 Thread David Steele
On 5/20/25 21:28, Michael Paquier wrote: The fix has been applied now on HEAD. I've also checked the state of pgaudit on branch dev-pg18, with the regression getting fixed. Things look clear now, at least from my side. Just retested and it looks good from my side, too. Thanks again! -David

Re: Regression in statement locations

2025-05-20 Thread David Steele
On 5/20/25 07:34, Sami Imseih wrote: Tested the patch and it looks good to me. Not that I thought it would fail, but I also confirmed the pgaudit case works as expected. I also tested and everything looks good with the patch. I did a careful examination of the remaining diffs (there are quite

Regression in statement locations

2025-05-19 Thread David Steele
Hackers, 499edb0 introduced more precise locations for nested statements. In general it works quite well and has made the pgAudit output much more readable -- so kudos for that. However, I have noticed one regression in the pgAudit tests. We have this somewhat odd statement intentionally cra

Re: Improve verification of recovery_target_timeline GUC.

2025-04-25 Thread David Steele
On 4/24/25 20:12, Michael Paquier wrote: On Thu, Apr 24, 2025 at 03:34:29PM +, David Steele wrote: Having said that, I think these tests are awfully expensive for a single GUC. Unit tests would definitely be preferable but that's not an option for GUCs, so far as I know. On my l

Re: Improve verification of recovery_target_timeline GUC.

2025-04-24 Thread David Steele
On 2/14/25 02:42, Michael Paquier wrote: On Fri, Jan 24, 2025 at 01:36:45PM +, David Steele wrote: + timeline = strtoull(*newval, &endp, 0); + + if (*endp != '\0' || errno == EINVAL || e

Re: Fix logging for invalid recovery timeline

2025-02-25 Thread David Steele
On 2/25/25 04:07, Benoit Lobréau wrote: On 2/24/25 11:33 PM, Michael Paquier wrote: On Mon, Feb 24, 2025 at 05:30:35PM +, David Steele wrote: +    /* translator: %s is a backup_label or pg_control file */ See for example PostmasterMain() with the

Re: Fix logging for invalid recovery timeline

2025-02-24 Thread David Steele
On 2/24/25 10:21, Benoit Lobréau wrote: On 2/24/25 2:05 AM, Michael Paquier wrote: I think that you have the right idea here, avoiding the duplication of the errdetail() which feels itchy when looking at the patch. Done this way in the attached patch. This looks good to me. This should ha

Re: Fix logging for invalid recovery timeline

2025-02-22 Thread David Steele
On 2/21/25 09:09, Benoit Lobréau wrote: On 2/20/25 4:40 PM, David Steele wrote: Benoit -- this was your idea. Did you want to submit a patch yourself? Here is an attempt at that. I kept the wording I used above. Is it fine to repeat the whole ereport block twice? I think for translation

Re: Fix logging for invalid recovery timeline

2025-02-20 Thread David Steele
On 2/19/25 19:45, Michael Paquier wrote: On Wed, Feb 19, 2025 at 05:35:18PM +, David Steele wrote: I like this idea but I would prefer to get the patch committed as-is first. The reason is that I'm hoping to see this batch-patched (since it is a bug) and that is less likely if the me

Re: Fix logging for invalid recovery timeline

2025-02-19 Thread David Steele
On 2/19/25 03:51, Benoit Lobréau wrote: I think the changes make sense. Would it be helpful to add the origin of the checkpoint record we are referring to ? (i.e. control file or backup label). For example: * Latest checkpoint in the control file is at %X/%X on timeline %u, * Checkpoint loc

Re: Return pg_control from pg_backup_stop().

2025-01-24 Thread David Steele
On 11/20/24 17:44, David Steele wrote: On 10/3/24 05:11, David Steele wrote: On 10/3/24 07:45, Michael Paquier wrote: 1) is something that has more value than 2), IMO, because there is no need for a manual step when a backup is taken by the replication protocol.  Well, custom backup solutions

Re: Improve verification of recovery_target_timeline GUC.

2025-01-24 Thread David Steele
On 1/24/25 01:44, Michael Paquier wrote: On Thu, Jan 23, 2025 at 02:53:39PM +, David Steele wrote: I discovered this while testing on Postgres versions < 12 where The tests are probably excessive but I needed something to show that the verification works as expected. Even with your pa

Improve verification of recovery_target_timeline GUC.

2025-01-23 Thread David Steele
robably excessive but I needed something to show that the verification works as expected. Regards, -David From f14e30b18cde216131bd3e069ee8ecd5da3301b0 Mon Sep 17 00:00:00 2001 From: David Steele Date: Fri, 20 Dec 2024 15:13:59 + Subject: Fix logging for invalid recovery timeline. If the requested rec

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-08 Thread David Steele
On 1/8/25 12:40, Bruce Momjian wrote: On Tue, Jan 7, 2025 at 01:03:05AM +, David Steele wrote: I'm more concerned about the report we saw on SUSE/NFS [1]. If that report is accurate it indicates this may not be something we can just document and move on from -- unless we are willi

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-06 Thread David Steele
On 1/6/25 19:08, Tom Lane wrote: David Steele writes: On 1/4/25 11:07, Thomas Munro wrote: As for CIFS, there are lots of reports of this sort of thing from Linux CIFS clients. There may be users running Postgres on CIFS but my guess is that is rare -- at least I have never seen anyone

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-06 Thread David Steele
On 1/4/25 11:07, Thomas Munro wrote: On Sat, Jan 4, 2025 at 5:48 AM David Steele wrote: We had one issue reported [1] involving Alpine Linux and CIFS and Not directly relevant for pgbackrest probably, but I noticed that Alpine comes up in a few reports of failing rm -r on CIFS. I think it

Re: Fwd: Re: A new look at old NFS readdir() problems?

2025-01-03 Thread David Steele
On 1/3/25 09:47, Greg Sabino Mullane wrote: On Fri, Jan 3, 2025 at 8:33 AM Robert Haas > wrote: We tried to make our code as robust as it could be in the face of kernel code that behaved in a manner that was fairly ridiculous relative to our needs. This

Re: Fix logging for invalid recovery timeline

2024-12-24 Thread David Steele
On 12/20/24 23:28, Andrey M. Borodin wrote: On 20 Dec 2024, at 20:37, David Steele wrote: "Latest checkpoint is at %X/%X on timeline %u, but in the history of the requested timeline, the server forked off from that timeline at %X/%X." I think errdetai here is very hard to foll

Fix logging for invalid recovery timeline

2024-12-20 Thread David Steele
gging. I think this should also be back-patched. Regards, -DavidFrom f14e30b18cde216131bd3e069ee8ecd5da3301b0 Mon Sep 17 00:00:00 2001 From: David Steele Date: Fri, 20 Dec 2024 15:13:59 + Subject: Fix logging for invalid recovery timeline. If the requested recovery timeline is not reachable

Re: Return pg_control from pg_backup_stop().

2024-11-20 Thread David Steele
On 10/3/24 05:11, David Steele wrote: On 10/3/24 07:45, Michael Paquier wrote: 1) is something that has more value than 2), IMO, because there is no need for a manual step when a backup is taken by the replication protocol.  Well, custom backup solutions that rely on the replication protocol

Re: Return pg_control from pg_backup_stop().

2024-10-03 Thread David Steele
On 10/3/24 07:45, Michael Paquier wrote: On Wed, Oct 02, 2024 at 09:03:27AM +, David Steele wrote: On 10/2/24 10:11, Michael Paquier wrote: I can definitely see us making other updates to pg_control so I would rather keep this logic centralized, even though it is not too complicated at

Re: Return pg_control from pg_backup_stop().

2024-10-02 Thread David Steele
On 10/2/24 10:11, Michael Paquier wrote: On Fri, May 17, 2024 at 12:46:49PM +1000, David Steele wrote: The basic idea is to harden recovery by returning a copy of pg_control from pg_backup_stop() that has a flag set to prevent recovery if the backup_label file is missing. Instead of backup

Re: pg_combinebackup does not detect missing files

2024-08-04 Thread David Steele
On 8/2/24 20:37, Robert Haas wrote: On Fri, Apr 19, 2024 at 11:47 AM Robert Haas wrote: Hmm, that's an interesting perspective. I've always been very skeptical of doing verification only around missing files and not anything else. I figured that wouldn't be particularly meaningful, and that's p

Re: Incremental backup from a streaming replication standby fails

2024-07-19 Thread David Steele
On 7/19/24 21:52, Robert Haas wrote: On Mon, Jul 15, 2024 at 11:27 AM Laurenz Albe wrote: On Sat, 2024-06-29 at 07:01 +0200, Laurenz Albe wrote: I played around with incremental backup yesterday and tried $subject The WAL summarizer is running on the standby server, but when I try to take an

Re: Logging which local address was connected to in log_line_prefix

2024-07-11 Thread David Steele
On 7/11/24 23:09, Greg Sabino Mullane wrote: Thanks for the review. Please find attached a new version with proper tabs and indenting. This looks good to me now. +1 overall for the feature. Regards, -David

Re: Logging which local address was connected to in log_line_prefix

2024-07-08 Thread David Steele
On 5/24/24 22:33, Greg Sabino Mullane wrote: Peter, thank you for the feedback. Attached is a new patch with "address" rather than "interface", plus a new default of "local" if there is no address. I also removed the questionable comment, and updated the commitfest title. I tried the updated

Re: replace strtok()

2024-07-07 Thread David Steele
On 6/24/24 19:57, Peter Eisentraut wrote: On 24.06.24 02:34, Michael Paquier wrote: On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote: Peter Eisentraut writes: On 18.06.24 13:43, Ranier Vilela wrote: I found another implementation of strsep, it seems lighter to me. I will attach it fo

Re: pg_combinebackup does not detect missing files

2024-05-20 Thread David Steele
On 5/21/24 03:09, Robert Haas wrote: On Fri, May 17, 2024 at 6:14 PM David Steele wrote: Then intentionally corrupt a file in the incr backup: $ truncate -s 0 test/backup/incr1/base/5/3764_fsm In this case pg_verifybackup will error: $ pg_verifybackup test/backup/incr1 pg_verifybackup

Re: pg_combinebackup does not detect missing files

2024-05-18 Thread David Steele
On 5/18/24 21:06, Tomas Vondra wrote: On 5/17/24 14:20, Robert Haas wrote: On Fri, May 17, 2024 at 1:18 AM David Steele wrote: However, I think allowing the user to optionally validate the input would be a good feature. Running pg_verifybackup as a separate step is going to be a more

Re: pg_combinebackup does not detect missing files

2024-05-17 Thread David Steele
On 5/17/24 22:20, Robert Haas wrote: On Fri, May 17, 2024 at 1:18 AM David Steele wrote: However, I think allowing the user to optionally validate the input would be a good feature. Running pg_verifybackup as a separate step is going to be a more expensive then verifying/copying at the same

Re: pg_combinebackup does not detect missing files

2024-05-16 Thread David Steele
On 4/25/24 00:05, Robert Haas wrote: On Tue, Apr 23, 2024 at 7:23 PM David Steele wrote: I don't understand what you mean here. I thought we were in agreement that verifying contents would cost a lot more. The verification that we can actually do without much cost can only check for mi

Re: Add recovery to pg_control and remove backup_label

2024-05-16 Thread David Steele
On 4/16/24 18:55, Stefan Fercot wrote: Hi, On Sunday, March 10th, 2024 at 4:47 AM, David Steele wrote: I've had a new idea which may revive this patch. The basic idea is to keep backup_label but also return a copy of pg_control from pg_stop_backup(). This copy of pg_control would be safe

Return pg_control from pg_backup_stop().

2024-05-16 Thread David Steele
I'll register this in the July CF. Regards, -David [1] https://www.postgresql.org/message-id/2daf8adc-8db7-4204-a7f2-a7e94e2bf...@pgmasters.netFrom 531872dcdb09f5d2f89af44a3c04c9d2d6da89be Mon Sep 17 00:00:00 2001 From: David Steele Date: Fri, 17 May 2024 02:42:35 + Subject: Return pg_co

Re: CREATE TABLE/ProcessUtility hook behavior change

2024-05-15 Thread David Steele
On 4/30/24 21:15, jian he wrote: On Tue, Apr 30, 2024 at 4:35 PM David Steele wrote: On 4/30/24 12:57, jian he wrote: On Tue, Apr 30, 2024 at 10:26 AM David Steele wrote: Since bb766cde cannot be readily applied to older commits in master I'm unable to continue bisecting to find the

Re: CREATE TABLE/ProcessUtility hook behavior change

2024-04-30 Thread David Steele
On 4/30/24 12:57, jian he wrote: On Tue, Apr 30, 2024 at 10:26 AM David Steele wrote: Since bb766cde cannot be readily applied to older commits in master I'm unable to continue bisecting to find the ALTER TABLE behavioral change. This seems to leave me with manual code inspection and

CREATE TABLE/ProcessUtility hook behavior change

2024-04-29 Thread David Steele
Hackers, While testing pgAudit against PG17 I noticed the following behavioral change: CREATE TABLE public.test ( id INT, name TEXT, description TEXT, CONSTRAINT test_pkey PRIMARY KEY (id) ); NOTICE: AUDIT: SESSION,23,1,DDL,CREATE TABLE,TABLE,public.test,"CREAT

Re: pg_combinebackup does not detect missing files

2024-04-23 Thread David Steele
On 4/22/24 23:53, Robert Haas wrote: On Sun, Apr 21, 2024 at 8:47 PM David Steele wrote: I figured that wouldn't be particularly meaningful, and that's pretty much the only kind of validation that's even theoretically possible without a bunch of extra overhead, since we compu

Re: pg_combinebackup does not detect missing files

2024-04-21 Thread David Steele
On 4/20/24 01:47, Robert Haas wrote: On Thu, Apr 18, 2024 at 7:36 PM David Steele wrote: Sure -- pg_combinebackup would only need to verify the data that it uses. I'm not suggesting that it should do an exhaustive verify of every single backup in the chain. Though I can see how it sounded

Re: pg_combinebackup does not detect missing files

2024-04-18 Thread David Steele
On 4/19/24 00:50, Robert Haas wrote: On Wed, Apr 17, 2024 at 7:09 PM David Steele wrote: Fair enough. I accept that your reasoning is not random, but I'm still not very satisfied that the user needs to run a separate and rather expensive process to do the verification when pg_combineb

Re: pg_combinebackup fails on file named INCREMENTAL.*

2024-04-18 Thread David Steele
On 4/19/24 01:10, Robert Haas wrote: On Wed, Apr 17, 2024 at 7:56 PM David Steele wrote: Thanks! I've tested this and it works as advertised. Ideally I'd want an error on backup if there is a similar file in any data directories that would cause an error on combine, but I admit

Re: pg_combinebackup fails on file named INCREMENTAL.*

2024-04-17 Thread David Steele
On 4/18/24 00:14, Robert Haas wrote: On Tue, Apr 16, 2024 at 9:25 AM Robert Haas wrote: On Mon, Apr 15, 2024 at 10:12 PM David Steele wrote: Anyway, I think it should be fixed or documented as a caveat since it causes a hard failure on restore. Alright, I'll look into this. Her

Re: pg_combinebackup does not detect missing files

2024-04-17 Thread David Steele
On 4/18/24 01:03, Robert Haas wrote: On Tue, Apr 16, 2024 at 7:25 PM David Steele wrote: But it will not go out of its way to perform checks that are unrelated to its documented purpose. And I don't think it should, especially if we have another tool that already does that. I'm not

Re: pg_combinebackup does not detect missing files

2024-04-16 Thread David Steele
On 4/16/24 23:50, Robert Haas wrote: On Wed, Apr 10, 2024 at 9:36 PM David Steele wrote: I've been playing around with the incremental backup feature trying to get a sense of how it can be practically used. One of the first things I always try is to delete random files and see what ha

Re: pg_combinebackup fails on file named INCREMENTAL.*

2024-04-15 Thread David Steele
On 4/16/24 06:33, Robert Haas wrote: On Sun, Apr 14, 2024 at 11:53 PM David Steele wrote: Since incremental backup is using INCREMENTAL as a keyword (rather than storing incremental info in the manifest) it is vulnerable to any file in PGDATA with the pattern INCREMENTAL.*. Yeah, that's

pg_combinebackup fails on file named INCREMENTAL.*

2024-04-14 Thread David Steele
Hackers, Since incremental backup is using INCREMENTAL as a keyword (rather than storing incremental info in the manifest) it is vulnerable to any file in PGDATA with the pattern INCREMENTAL.*. For example: $ pg_basebackup -c fast -D test/backup/full -F plain $ touch test/data/INCREMENTAL.CO

Re: post-freeze damage control

2024-04-14 Thread David Steele
On 4/13/24 21:02, Tomas Vondra wrote: On 4/13/24 01:23, David Steele wrote: Even for the summarizer, though, I do worry about the complexity of maintaining it over time. It seems like it would be very easy to introduce a bug and have it go unnoticed until it causes problems in the field. A lot

Re: post-freeze damage control

2024-04-12 Thread David Steele
On 4/12/24 22:27, Tomas Vondra wrote: On 4/12/24 08:42, David Steele wrote: On 4/11/24 20:26, Tomas Vondra wrote: On 4/11/24 03:52, David Steele wrote: On 4/11/24 10:23, Tom Kincaid wrote: The extensive Beta process we have can be used to build confidence we need in a feature that has

Re: post-freeze damage control

2024-04-12 Thread David Steele
On 4/12/24 22:12, Tomas Vondra wrote: On 4/11/24 23:48, David Steele wrote: On 4/11/24 20:26, Tomas Vondra wrote: >> FWIW that discussion also mentions stuff that I think the feature should not do. In particular, I don't think the ambition was (or should be) to make pg_basebackup i

Re: Add notes to pg_combinebackup docs

2024-04-12 Thread David Steele
On 4/12/24 22:40, Tomas Vondra wrote: On 4/12/24 11:50, David Steele wrote: On 4/12/24 19:09, Magnus Hagander wrote: On Fri, Apr 12, 2024 at 12:14 AM David Steele > > But yeah, having to keep the backups as expanded directories is not > great, I'd love to

Re: Add notes to pg_combinebackup docs

2024-04-12 Thread David Steele
On 4/12/24 19:09, Magnus Hagander wrote: On Fri, Apr 12, 2024 at 12:14 AM David Steele OK, sure, but if the plan is to make it practical later doesn't that make the feature something to be avoided now? That could be said for any feature. When we shipped streaming replication

Re: post-freeze damage control

2024-04-11 Thread David Steele
On 4/11/24 20:26, Tomas Vondra wrote: On 4/11/24 03:52, David Steele wrote: On 4/11/24 10:23, Tom Kincaid wrote: The extensive Beta process we have can be used to build confidence we need in a feature that has extensive review and currently has no known issues or outstanding objections. I

Re: post-freeze damage control

2024-04-11 Thread David Steele
On 4/12/24 12:15, Robert Haas wrote: On Thu, Apr 11, 2024 at 5:48 PM David Steele wrote: But they'll try because it is a new pg_basebackup feature and they'll assume it is there to be used. Maybe it would be a good idea to make it clear in the documentation that significant tooli

Re: Add notes to pg_combinebackup docs

2024-04-11 Thread David Steele
On 4/11/24 20:51, Tomas Vondra wrote: On 4/11/24 02:01, David Steele wrote: I have a hard time seeing this feature as being very useful, especially for large databases, until pg_combinebackup works on tar (and compressed tar). Right now restoring an incremental requires at least twice the

Re: post-freeze damage control

2024-04-11 Thread David Steele
On 4/11/24 20:26, Tomas Vondra wrote: On 4/11/24 03:52, David Steele wrote: On 4/11/24 10:23, Tom Kincaid wrote: The extensive Beta process we have can be used to build confidence we need in a feature that has extensive review and currently has no known issues or outstanding objections. I

Re: post-freeze damage control

2024-04-10 Thread David Steele
On 4/11/24 10:23, Tom Kincaid wrote: The extensive Beta process we have can be used to build confidence we need in a feature that has extensive review and currently has no known issues or outstanding objections. I did have objections, here [1] and here [2]. I think the complexity, space req

pg_combinebackup does not detect missing files

2024-04-10 Thread David Steele
Hackers, I've been playing around with the incremental backup feature trying to get a sense of how it can be practically used. One of the first things I always try is to delete random files and see what happens. You can delete pretty much anything you want from the most recent incremental ba

Re: Add notes to pg_combinebackup docs

2024-04-10 Thread David Steele
On 4/9/24 19:44, Tomas Vondra wrote: On 4/9/24 09:59, Martín Marqués wrote: Hello, While doing some work/research on the new incremental backup feature some limitations were not listed in the docs. Mainly the fact that pg_combienbackup works with plain format and not tar. Right. The docs mo

Re: post-freeze damage control

2024-04-10 Thread David Steele
On 4/10/24 09:50, Michael Paquier wrote: On Wed, Apr 10, 2024 at 09:29:38AM +1000, David Steele wrote: Even so, only keeping WAL for the last backup is a dangerous move in any case. Lots of things can happen to a backup (other than bugs in the software) so keeping WAL back to the last full

Re: post-freeze damage control

2024-04-09 Thread David Steele
On 4/10/24 01:59, Andrey M. Borodin wrote: On 9 Apr 2024, at 18:45, Alvaro Herrera wrote: Maybe we should explicitly advise users to not delete that WAL from their archives, until pg_combinebackup is hammered a bit more. As a backup tool maintainer, I always reference to out-of-the box Post

Re: Possibility to disable `ALTER SYSTEM`

2024-03-20 Thread David Steele
On 3/20/24 22:30, Michael Banck wrote: On Tue, Mar 19, 2024 at 10:51:50AM -0400, Tom Lane wrote: Heikki Linnakangas writes: Perhaps we could make that even better with a GUC though. I propose a GUC called 'configuration_managed_externally = true / false". If you set it to true, we prevent ALT

Re: Add basic tests for the low-level backup method.

2024-03-14 Thread David Steele
On 3/15/24 18:32, Michael Paquier wrote: On Fri, Mar 15, 2024 at 06:23:15PM +1300, David Steele wrote: Well, this is what we recommend in the docs, i.e. using bin mode to save backup_label, so it seems OK to me. Indeed, I didn't notice that this is actually documented, so what I did too

Re: Add basic tests for the low-level backup method.

2024-03-14 Thread David Steele
On 3/15/24 12:38, Michael Paquier wrote: On Fri, Mar 15, 2024 at 09:40:38AM +1300, David Steele wrote: Is the missing test in meson the reason we did not see test failures for Windows in CI? The test has to be listed in src/test/recovery/meson.build or the CI would ignore it. Right -- I

Re: Add basic tests for the low-level backup method.

2024-03-14 Thread David Steele
On 3/14/24 20:00, Michael Paquier wrote: On Thu, Mar 14, 2024 at 09:12:52AM +1300, David Steele wrote: I think you are right that the start message is better since it can only appear once when the backup_label is found. The completed message could in theory appear after a restart, though the

Re: Add basic tests for the low-level backup method.

2024-03-13 Thread David Steele
On 3/13/24 19:15, Michael Paquier wrote: On Wed, Mar 13, 2024 at 01:12:28PM +1300, David Steele wrote: Not sure what to look for here. There are no distinct messages for crash recovery. Perhaps there should be? The closest thing I can think of here would be "database system was not pro

Re: Add basic tests for the low-level backup method.

2024-03-12 Thread David Steele
On 2/29/24 16:55, Michael Paquier wrote: On Thu, Feb 29, 2024 at 10:30:52AM +1300, David Steele wrote: On 11/12/23 08:21, David Steele wrote: As promised in [1], attached are some basic tests for the low-level backup method. Added to the 2024-03 CF. There is already

Re: Add checkpoint/redo LSNs to recovery errors.

2024-03-09 Thread David Steele
On 2/29/24 16:42, Michael Paquier wrote: On Thu, Feb 29, 2024 at 10:53:15AM +1300, David Steele wrote: This patch adds checkpoint/redo LSNs to recovery error messages where they may be useful for debugging. I've scanned a bit xlogrecovery.c, and I have spotted a couple of that could gain

Re: Add recovery to pg_control and remove backup_label

2024-03-09 Thread David Steele
On 1/29/24 12:28, David Steele wrote: On 1/28/24 19:11, Michael Paquier wrote: On Fri, Jan 26, 2024 at 06:27:30PM +0530, vignesh C wrote: Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_3511.log With the recent introduction of incremental backups that

Add checkpoint/redo LSNs to recovery errors.

2024-02-28 Thread David Steele
9678d0cd928c58e4a3d038c8aa4c191f5bdddbe7 Mon Sep 17 00:00:00 2001 From: David Steele Date: Wed, 28 Feb 2024 21:45:39 + Subject: Add checkpoint/redo LSNs to recovery errors. When backup_label is present the LSNs are already output in a log message, but it still seems like a good idea to repeat them. When

Re: Add basic tests for the low-level backup method.

2024-02-28 Thread David Steele
On 11/12/23 08:21, David Steele wrote: As promised in [1], attached are some basic tests for the low-level backup method. Added to the 2024-03 CF. Regards, -David

Re: Use of backup_label not noted in log

2024-01-29 Thread David Steele
On 1/28/24 20:09, Michael Paquier wrote: On Fri, Jan 26, 2024 at 12:08:46PM +0900, Michael Paquier wrote: Well, I'm OK with this consensus on 1d35f705e if folks think this is useful enough for all the stable branches. I have done that down to REL_15_STABLE for now as this is able to apply clea

Re: Add recovery to pg_control and remove backup_label

2024-01-28 Thread David Steele
On 1/28/24 19:11, Michael Paquier wrote: On Fri, Jan 26, 2024 at 06:27:30PM +0530, vignesh C wrote: Please post an updated version for the same. [1] - http://cfbot.cputube.org/patch_46_3511.log With the recent introduction of incremental backups that depend on backup_label and the rather nega

Re: Use of backup_label not noted in log

2024-01-26 Thread David Steele
On 1/25/24 20:52, Michael Paquier wrote: On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote: I would still advocate for a back patch here. It is frustrating to get logs from users that just say: LOG: invalid checkpoint record PANIC: could not locate a valid checkpoint record It

Re: Use of backup_label not noted in log

2024-01-25 Thread David Steele
On 1/25/24 17:42, Tom Lane wrote: David Steele writes: Another thing to note here -- knowing the LSN is important but also knowing that backup recovery was attempted (i.e. backup_label exists) is really crucial. Knowing both just saves so much time in back and forth debugging. It appears

Re: Use of backup_label not noted in log

2024-01-25 Thread David Steele
On 1/25/24 09:29, Michael Banck wrote: Hi, On Thu, Jan 25, 2024 at 08:56:52AM -0400, David Steele wrote: I would still advocate for a back patch here. It is frustrating to get logs from users that just say: LOG: invalid checkpoint record PANIC: could not locate a valid checkpoint record It

Re: Use of backup_label not noted in log

2024-01-25 Thread David Steele
On 1/25/24 04:12, Michael Paquier wrote: On Mon, Jan 22, 2024 at 04:36:27PM +0900, Michael Paquier wrote: + if (ControlFile->backupStartPoint != InvalidXLogRecPtr) Nit 1: I would use XLogRecPtrIsInvalid here. + ereport(LOG, + (errmsg("completed backup recovery with re

Re: Use of backup_label not noted in log

2024-01-19 Thread David Steele
On 11/20/23 15:08, David Steele wrote: On 11/20/23 14:27, Andres Freund wrote: I've wondered whether it's worth also adding an explicit message just after ReachedEndOfBackup(), but it seems far less urgent due to the existing "consistent recovery state reached at %X/%X" me

Re: Detecting some cases of missing backup_label

2023-12-21 Thread David Steele
On 12/21/23 07:37, Andres Freund wrote: On 2023-12-20 13:11:37 -0400, David Steele wrote: I've run this through a bunch of scenarios (in my head) with parallel backups and it does seem to hold up. I think we'd need to write the state file before XLOG_BACKUP_START just in case. Seems

Re: Detecting some cases of missing backup_label

2023-12-20 Thread David Steele
On 12/18/23 10:39, Stephen Frost wrote: Greetings, * Stephen Frost (sfr...@snowman.net) wrote: * Andres Freund (and...@anarazel.de) wrote: I recently mentioned to Robert (and also Heikki earlier), that I think I see a way to detect an omitted backup_label in a relevant subset of the cases (it'

Re: Add recovery to pg_control and remove backup_label

2023-11-21 Thread David Steele
On 11/21/23 16:00, Andres Freund wrote: Hi, On 2023-11-21 14:48:59 -0400, David Steele wrote: I'd not call 7.06->4.77 or 6.76->4.77 "virtually free". OK, but how does that look with compression With compression it's obviously somewhat different - but tha

Re: Add recovery to pg_control and remove backup_label

2023-11-21 Thread David Steele
On 11/21/23 13:59, Andres Freund wrote: On 2023-11-21 13:41:15 -0400, David Steele wrote: On 11/20/23 16:41, Andres Freund wrote: On 2023-11-20 15:56:19 -0400, David Steele wrote: I understand this is an option -- but does it need to be? What is the benefit of excluding the manifest? It&#

Re: Add recovery to pg_control and remove backup_label

2023-11-21 Thread David Steele
On 11/20/23 16:37, Andres Freund wrote: On 2023-11-20 11:11:13 -0500, Robert Haas wrote: I think we need more votes to make a change this big. I have a concern, which I think I've expressed before, that we keep whacking around the backup APIs, and that has a cost which is potentially larger tha

Re: Add recovery to pg_control and remove backup_label

2023-11-21 Thread David Steele
On 11/20/23 16:41, Andres Freund wrote: On 2023-11-20 15:56:19 -0400, David Steele wrote: I understand this is an option -- but does it need to be? What is the benefit of excluding the manifest? It's not free to create the manifest, particularly if checksums are enabled. It's

Re: Add recovery to pg_control and remove backup_label

2023-11-21 Thread David Steele
On 11/21/23 12:41, Andres Freund wrote: On 2023-11-21 07:42:42 -0400, David Steele wrote: On 11/20/23 19:58, Andres Freund wrote: On 2023-11-21 08:52:08 +0900, Michael Paquier wrote: On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote: Given that, I wonder if what we should do is

Re: Add recovery to pg_control and remove backup_label

2023-11-21 Thread David Steele
On 11/20/23 19:58, Andres Freund wrote: On 2023-11-21 08:52:08 +0900, Michael Paquier wrote: On Mon, Nov 20, 2023 at 12:37:46PM -0800, Andres Freund wrote: Given that, I wonder if what we should do is to just add a new field to pg_control that says "error out if backup_label does not exist", t

Re: Use of backup_label not noted in log

2023-11-21 Thread David Steele
On 11/20/23 23:54, Michael Paquier wrote: On Mon, Nov 20, 2023 at 03:31:20PM -0400, David Steele wrote: I still wonder if we need "base backup" in the messages? That sort of implies (at least to me) you used pg_basebackup but that may not be the case. Or just s/base backup/backup/

Re: Add recovery to pg_control and remove backup_label

2023-11-20 Thread David Steele
On 11/20/23 15:47, Robert Haas wrote: On Mon, Nov 20, 2023 at 2:41 PM David Steele wrote: I can't see why a backup would continue to be valid without a manifest -- that's not very standard for backup software. If you have the critical info in backup_label, you can't afford to lo

Re: Add recovery to pg_control and remove backup_label

2023-11-20 Thread David Steele
On 11/20/23 14:44, Robert Haas wrote: On Mon, Nov 20, 2023 at 12:54 PM David Steele wrote: Another thing we could do is explicitly error if we see backup_label in PGDATA during recovery. That's just a few lines of code so would not be a big deal to maintain. This error would only be visib

Re: Use of backup_label not noted in log

2023-11-20 Thread David Steele
On 11/20/23 15:03, Andres Freund wrote: On 2023-11-20 11:35:15 +0100, Laurenz Albe wrote: If we add a message for starting with "backup_label", shouldn't we also add a corresponding message for starting from a checkpoint found in the control file? If you see that in a problem report, you immed

Re: Use of backup_label not noted in log

2023-11-20 Thread David Steele
On 11/20/23 14:27, Andres Freund wrote: Hi, On 2023-11-19 14:28:12 -0400, David Steele wrote: On 11/18/23 17:49, Andres Freund wrote: On 2023-11-18 10:01:42 -0800, Andres Freund wrote: Not enamored with the phrasing of the log messages, but here's a prototype: When starting up

Re: Add recovery to pg_control and remove backup_label

2023-11-20 Thread David Steele
On 11/20/23 12:11, Robert Haas wrote: On Sun, Nov 19, 2023 at 8:16 PM Michael Paquier wrote: (I am not exactly sure how, but we've lost pgsql-hackers on the way when you sent v5. Now added back in CC with the two latest patches you've proposed attached.) Here is a short summary of what has

Re: Use of backup_label not noted in log

2023-11-20 Thread David Steele
On 11/20/23 12:24, Robert Haas wrote: On Mon, Nov 20, 2023 at 5:35 AM Laurenz Albe wrote: I can accept that adding log messages to back branches is ok. Perhaps I am too nervous about things like that, because as an extension developer I have been bitten too often by ABI breaks in minor releases

Re: Use of backup_label not noted in log

2023-11-20 Thread David Steele
On 11/20/23 06:35, Laurenz Albe wrote: If we add a message for starting with "backup_label", shouldn't we also add a corresponding message for starting from a checkpoint found in the control file? If you see that in a problem report, you immediately know what is going on. +1. It is easier to

Re: Use of backup_label not noted in log

2023-11-20 Thread David Steele
[Resending since I accidentally replied off-list] On 11/18/23 17:49, Andres Freund wrote: On 2023-11-18 10:01:42 -0800, Andres Freund wrote: What about adding it to the "redo starts at" message, something like redo starts at 12/12345678, taken from control file or redo starts at 12/123

Re: Add recovery to pg_control and remove backup_label

2023-11-20 Thread David Steele
moval of the backup_label file. This looks right to me. On Sun, Nov 19, 2023 at 02:14:32PM -0400, David Steele wrote: On 11/15/23 20:03, Michael Paquier wrote: As the label is only an informational field, the parsing added to pg_verifybackup is not really needed because it is used nowhere in the

Re: Use of backup_label not noted in log

2023-11-18 Thread David Steele
On 11/17/23 01:41, Laurenz Albe wrote: On Thu, 2023-11-16 at 20:18 -0800, Andres Freund wrote: I've often had to analyze what caused corruption in PG instances, where the symptoms match not having had backup_label in place when bringing on the node. However that's surprisingly hard - the only lo

Re: Use of backup_label not noted in log

2023-11-18 Thread David Steele
On 11/17/23 00:18, Andres Freund wrote: I've often had to analyze what caused corruption in PG instances, where the symptoms match not having had backup_label in place when bringing on the node. However that's surprisingly hard - the only log messages that indicate use of backup_label are at DEB

Add basic tests for the low-level backup method.

2023-11-11 Thread David Steele
at least supplies some basic tests and provides a place for future improvement. Regards, -David [1] https://www.postgresql.org/message-id/2daf8adc-8db7-4204-a7f2-a7e94e2bfa4b%40pgmasters.netFrom b1a10ab4ab912e6f174206fcb8ce67496f93df83 Mon Sep 17 00:00:00 2001 From: David Steele Date: Sat, 11

Re: Add recovery to pg_control and remove backup_label

2023-11-10 Thread David Steele
On 11/10/23 00:37, Michael Paquier wrote: On Tue, Nov 07, 2023 at 05:20:27PM +0900, Michael Paquier wrote: On Mon, Nov 06, 2023 at 05:39:02PM -0400, David Steele wrote: I've retested today, and miss the failure. I'll let you know if I see this again. I've done a few more

Re: Add recovery to pg_control and remove backup_label

2023-11-06 Thread David Steele
On 11/6/23 02:35, Michael Paquier wrote: On Sun, Nov 05, 2023 at 01:45:39PM -0400, David Steele wrote: Rebased on 151ffcf6. I like this patch a lot. Even if the backup_label file is removed, we still have all the debug information from the backup history file, thanks to its LABEL, BACKUP

Re: Add recovery to pg_control and remove backup_label

2023-11-06 Thread David Steele
On 11/6/23 01:05, Michael Paquier wrote: On Fri, Oct 27, 2023 at 10:10:42AM -0400, David Steele wrote: We are still planning to address this issue in the back branches. FWIW, redesigning the backend code in charge of doing base backups in the back branches is out of scope. Based on a read of

  1   2   3   4   5   6   7   8   9   10   >