On Wed, Sep 18, 2019 at 9:11 PM David Steele <da...@pgmasters.net> wrote: > Also consider adding the timestamp.
Sounds reasonable, even if only for the benefit of humans who might look at the file. We can decide later whether to use it for anything else (and third-party tools could make different decisions from core). I assume we're talking about file mtime here, not file ctime or file atime or the time the manifest was generated, but let me know if I'm wrong. > Consider adding a reference to each file that specifies where the file > can be found in if it is not in this backup. As I understand the > pg_basebackup proposal, it would only be implementing differential > backups, i.e. an incremental that is *only* based on the last full > backup. So, the reference can be inferred in this case. However, if > the user selects the wrong full backup on restore, and we have labeled > each backup, then a differential restore with references against the > wrong full backup would result in a hard error rather than corruption. I intend that we should be able to support incremental backups based either on a previous full backup or based on a previous incremental backup. I am not aware of a technical reason why we need to identify the specific backup that must be used. If incremental backup B is taken based on a pre-existing backup A, then I think that B can be restored using either A or *any other backup taken after A and before B*. In the normal case, there probably wouldn't be any such backup, but AFAICS the start-LSNs are a sufficient cross-check that the chosen base backup is legal. > Based on my original calculations (which sadly I don't have anymore), > the combination of SHA1, size, and file name is *extremely* unlikely to > generate a collision. As in, unlikely to happen before the end of the > universe kind of unlikely. Though, I guess it depends on your > expectations for the lifetime of the universe. Somebody once said that we should be prepared for it to end at an any time, or not, and that the time at which it actually was due to end would not be disclosed in advance. This is probably good life advice which I ought to take more frequently than I do, but I think we can finesse the issue for purposes of this discussion. What I'd say is: if the probability of getting a collision is demonstrably many orders of magnitude less than the probability of the disk writing the block incorrectly, then I think we're probably reasonably OK. Somebody might differ, which is perhaps a mild point in favor of LSN-based approaches, but as a practical matter, if a bad block is a billion times more likely to be the result of a disk error than a checksum mismatch, then it's a negligible risk. > And maybe a few other bits of metadata, but I'm not sure > > exactly what. Ideas? > > A backup label for sure. You can also use this as the directory/tar > name to save the user coming up with one. We use YYYYMMDDHH24MMSSF for > full backups and YYYYMMDDHH24MMSSF_YYYYMMDDHH24MMSS(D|I) for > incrementals and have logic to prevent two backups from having the same > label. This is unlikely outside of testing but still a good idea. > > Knowing the start/stop time of the backup is useful in all kinds of > ways, especially monitoring and time-targeted PITR. Start/stop LSN is > also good. I know this is also in backup_label but having it all in one > place is nice. > > We include the version/sysid of the cluster to avoid mixups. It's a > great extra check on top of references to be sure everything is kosher. I don't think it's a good idea to duplicate the information that's already in the backup_label. Storing two copies of the same information is just an invitation to having to worry about what happens if they don't agree. > A manifest version is good in case we change the format later. Yeah. > I'd > recommend JSON for the format since it is so ubiquitous and easily > handles escaping which can be gotchas in a home-grown format. We > currently have a format that is a combination of Windows INI and JSON > (for human-readability in theory) and we have become painfully aware of > escaping issues. Really, why would you drop files with '=' in their > name in PGDATA? And yet it happens. I am not crazy about JSON because it requires that I get a json parser into src/common, which I could do, but given the possibly-imminent end of the universe, I'm not sure it's the greatest use of time. You're right that if we pick an ad-hoc format, we've got to worry about escaping, which isn't lovely. > > (1) When taking a backup, have the option (perhaps enabled by default) > > to include a backup manifest. > > Manifests are cheap to builds so I wouldn't make it an option. Huh. That's an interesting idea. Thanks. > > (3) Cross-check a manifest against a backup and complain about extra > > files, missing files, size differences, or checksum mismatches. > > Verification is the best part of the manifest. Plus, you can do > verification pretty cheaply on restore. We also restore pg_control last > so clusters that have a restore error won't start. There's no "restore" operation here, really. A backup taken by pg_basebackup can be "restored" by copying the whole thing, but it can also be used just where it is. If we were going to build something into some in-core tool to copy backups around, this would be a smart way to implement said tool, but I'm not planning on that myself. > > One thing I'm not quite sure about is where to store the backup > > manifest. If you take a base backup in tar format, you get base.tar, > > pg_wal.tar (unless -Xnone), and an additional tar file per tablespace. > > Does the backup manifest go into base.tar? Get written into a separate > > file outside of any tar archive? Something else? And what about a > > plain-format backup? I suppose then we should just write the manifest > > into the top level of the main data directory, but perhaps someone has > > another idea. > > We do: > > [backup_label]/ > backup.manifest > pg_data/ > pg_tblspc/ > > In general, having the manifest easily accessible is ideal. That's a fine choice for a tool, but a I'm talking about something that is part of the actual backup format supported by PostgreSQL, not what a tool might wrap around it. The choice is whether, for a tar-format backup, the manifest goes inside a tar file or as a separate file. To put that another way, a patch adding backup manifests does not get to redesign where pg_basebackup puts anything else; it only gets to decide where to put the manifest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company