Re: pg_combinebackup fails on file named INCREMENTAL.*

Robert Haas Mon, 15 Apr 2024 13:33:54 -0700

On Sun, Apr 14, 2024 at 11:53 PM David Steele <[email protected]> wrote:
> Since incremental backup is using INCREMENTAL as a keyword (rather than
> storing incremental info in the manifest) it is vulnerable to any file
> in PGDATA with the pattern INCREMENTAL.*.


Yeah, that's true. I'm not greatly concerned about this, because I
think anyone who creates a file called INCREMENTAL.CONFIG is just
being perverse. However, we could ignore those files just in subtrees
where they're not expected to occur i.e. only pay attention to them
under base, global, and pg_tblspc. I didn't do this because I thought
someone might eventually want to do something like incremental backup
of SLRU files, and then it would be annoying if there were a bunch of
random pathname restrictions. But if we're really concerned about
this, I think it would be a reasonable fix; you're not really supposed
to decide to store your configuration files in directories that exist
for the purpose of storing relation data files.

> We could fix the issue by forbidding this file pattern in PGDATA, i.e.
> error when it is detected during pg_basebackup, but in my view it would
> be better (at least eventually) to add incremental info to the manifest.
> That would also allow us to skip storing zero-length files and
> incremental stubs (with no changes) as files.

I did think about this, and I do like the idea of being able to elide
incremental stubs. If you have a tremendous number of relations and
very few of them have changed at all, the incremental stubs might take
up a significant percentage of the space consumed by the incremental
backup, which kind of sucks, even if the incremental backup is still
quite small compared to the original database. And, like you, I had
the idea of trying to use the backup_manifest to do it.

But ... I didn't really end up feeling very comfortable with it. Right
now, the backup manifest is something we only use to verify the
integrity of the backup. If we were to do this, it would become a
critical part of the backup. I don't think I like the idea that
removing the backup_manifest should be allowed to, in effect, corrupt
the backup. But I think I dislike even more the idea that the data
that is used to verify the backup gets mushed together with the backup
data itself. Maybe in practice it's fine, but it doesn't seem very
conceptually clean.

There are also some practical considerations that made me not want to
go in this direction: we'd need to make sure that the relevant
information from the backup manifest was available to the
reconstruction process at the right part of the code. When iterating
over a directory, it would need to consider all of the files actually
present in that directory plus any "hallucinated" incremental stubs
from the manifest. I didn't feel confident of my ability to implement
that in the available time without messing anything up.

I think we should consider other possible designs here. For example,
instead of including this in the manifest, we could ship one
INCREMENTAL.STUBS file per directory that contains a list of all of
the incremental stubs that should be created in that directory. My
guess is that something like this will turn out to be way simpler to
code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: pg_combinebackup fails on file named INCREMENTAL.*

Reply via email to