Re: Add recovery to pg_control and remove backup_label

David Steele Fri, 27 Oct 2023 07:11:01 -0700

On 10/26/23 17:27, David G. Johnston wrote:

On Thu, Oct 26, 2023 at 2:02 PM David Steele <da...@pgmasters.net<mailto:da...@pgmasters.net>> wrote:
Are we planning on dealing with torn writes in the back branches in someway or are we just throwing in the towel and saying the old method istoo error-prone to exist/retain


We are still planning to address this issue in the back branches.

and therefore the goal of the v17changes is to not only provide a better way but also to ensure the oldway no longer works? It seems sufficient to change the output signatureof pg_backup_stop to accomplish that goal though I am pondering whetheran explicit check and error for seeing the backup_label file would bewarranted.

Well, if the backup tool is just copying the second column of output tothe backup_label, then it won't break. Of course in that case, restoreswon't work correctly but you would not get an error. Testing would showthat it is not working properly and backup tools should certainly be tested.

Even so, I'm OK with an explicit check for backup_label. Let's see whatothers think.

If we are going to solve the torn writes problem completely then while Iagree the new way is superior, implementing it doesn't have to meanexisting tools built to produce backup_label and rely upon thepg_control in the data directory need to be forcibly broken.

It is a pretty easy update to any backup software that supportsnon-exclusive backup. I was able to make the changes to pgBackRest inless than an hour. We've made major changes to backup and restore inalmost every major version of PostgreSQL for a while: non-exlusivebackup in 9.6, dir renames in 10, variable WAL size in 11, new recoverylocation in 12, hard recovery target errors in 13, and changes tonon-exclusive backup and removal of exclusive backup in 15. In 17 we arealready looking at new page and segment sizes.

    I know that outputting pg_control as bytea is going to be a bit
    controversial. Software that is using psql get run pg_backup_stop()
    could use encode() to get pg_control as text and then decode it later.
    Alternately, we could update ReadControlFile() to recognize a
    base64-encoded pg_control file. I'm not sure dealing with binary
    data is
    that much of a problem, though, and if the backup software gets it
    wrong
    then recovery with fail on an invalid pg_control file.
Can we not figure out some way to place the relevant files onto theserver somewhere so that a simple "cp" command would work? Havepg_backup_stop return paths instead of contents, those paths being"$TEMP_DIR"/<random unique new directory>/pg_control.conf (andtablespace_map)

Nobody has been able to figure this out, and some of us have beenthinking about it for years. It just doesn't seem possible to reliablytell the difference between a cluster that was copied and one thatsimply crashed.

If cp is really the backup tool being employed, I would recommend usingpg_basebackup. cp has flaws that could lead to corruption, and of coursedoes not at all take into account the archive required to make a backupconsistent, directories to be excluded, the order of copying pg_controlon backup from standy, etc., etc.

Backup/restore is not a simple endeavor and we don't do anyone favorspretending that it is.


Regards,
-David

Re: Add recovery to pg_control and remove backup_label

Reply via email to