On Sun, Apr 14, 2024 at 11:53 PM David Steele <da...@pgmasters.net> wrote: > Since incremental backup is using INCREMENTAL as a keyword (rather than > storing incremental info in the manifest) it is vulnerable to any file > in PGDATA with the pattern INCREMENTAL.*.
Yeah, that's true. I'm not greatly concerned about this, because I think anyone who creates a file called INCREMENTAL.CONFIG is just being perverse. However, we could ignore those files just in subtrees where they're not expected to occur i.e. only pay attention to them under base, global, and pg_tblspc. I didn't do this because I thought someone might eventually want to do something like incremental backup of SLRU files, and then it would be annoying if there were a bunch of random pathname restrictions. But if we're really concerned about this, I think it would be a reasonable fix; you're not really supposed to decide to store your configuration files in directories that exist for the purpose of storing relation data files. > We could fix the issue by forbidding this file pattern in PGDATA, i.e. > error when it is detected during pg_basebackup, but in my view it would > be better (at least eventually) to add incremental info to the manifest. > That would also allow us to skip storing zero-length files and > incremental stubs (with no changes) as files. I did think about this, and I do like the idea of being able to elide incremental stubs. If you have a tremendous number of relations and very few of them have changed at all, the incremental stubs might take up a significant percentage of the space consumed by the incremental backup, which kind of sucks, even if the incremental backup is still quite small compared to the original database. And, like you, I had the idea of trying to use the backup_manifest to do it. But ... I didn't really end up feeling very comfortable with it. Right now, the backup manifest is something we only use to verify the integrity of the backup. If we were to do this, it would become a critical part of the backup. I don't think I like the idea that removing the backup_manifest should be allowed to, in effect, corrupt the backup. But I think I dislike even more the idea that the data that is used to verify the backup gets mushed together with the backup data itself. Maybe in practice it's fine, but it doesn't seem very conceptually clean. There are also some practical considerations that made me not want to go in this direction: we'd need to make sure that the relevant information from the backup manifest was available to the reconstruction process at the right part of the code. When iterating over a directory, it would need to consider all of the files actually present in that directory plus any "hallucinated" incremental stubs from the manifest. I didn't feel confident of my ability to implement that in the available time without messing anything up. I think we should consider other possible designs here. For example, instead of including this in the manifest, we could ship one INCREMENTAL.STUBS file per directory that contains a list of all of the incremental stubs that should be created in that directory. My guess is that something like this will turn out to be way simpler to code. -- Robert Haas EDB: http://www.enterprisedb.com