On Wed, Jul 31, 2019 at 1:59 PM vignesh C <vignes...@gmail.com> wrote: > I feel Robert's suggestion is good. > We can probably keep one meta file for each backup with some basic information > of all the files being backed up, this metadata file will be useful in the > below case: > Table dropped before incremental backup > Table truncated and Insert/Update/Delete operations before incremental backup
There's really no need for this with the design I proposed. The files that should exist when you restore in incremental backup are exactly the set of files that exist in the final incremental backup, except that any .partial files need to be replaced with a correct reconstruction of the underlying file. You don't need to know what got dropped or truncated; you only need to know what's supposed to be there at the end. You may be thinking, as I once did, that restoring an incremental backup would consist of restoring the full backup first and then layering the incrementals over it, but if you read what I proposed, it actually works the other way around: you restore the files that are present in the incremental, and as needed, pull pieces of them from earlier incremental and/or full backups. I think this is a *much* better design than doing it the other way; it avoids any risk of getting the wrong answer due to truncations or drops, and it also is faster, because you only read older backups to the extent that you actually need their contents. I think it's a good idea to try to keep all the information about a single file being backup in one place. It's just less confusing. If, for example, you have a metadata file that tells you which files are dropped - that is, which files you DON'T have - then what happen if one of those files is present in the data directory after all? Well, then you have inconsistent information and are confused, and maybe your code won't even notice the inconsistency. Similarly, if the metadata file is separate from the block data, then what happens if one file is missing, or isn't from the same backup as the other file? That shouldn't happen, of course, but if it does, you'll get confused. There's no perfect solution to these kinds of problems: if we suppose that the backup can be corrupted by having missing or extra files, why not also corruption within a single file? Still, on balance I tend to think that keeping related stuff together minimizes the surface area for bugs. I realize that's arguable, though. One consideration that goes the other way: if you have a manifest file that says what files are supposed to be present in the backup, then you can detect a disappearing file, which is impossible with the design I've proposed (and with the current full backup machinery). That might be worth fixing, but it's a separate feature that has little to do with incremental backup. > Probably it can also help us to decide which work the worker needs to do > if we are planning to backup in parallel. I don't think we need a manifest file for parallel backup. One process or thread can scan the directory tree, make a list of which files are present, and then hand individual files off to other processes or threads. In short, the directory listing serves as the manifest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company