Re: pg_combinebackup fails on file named INCREMENTAL.*

David Steele Mon, 15 Apr 2024 19:12:28 -0700

On 4/16/24 06:33, Robert Haas wrote:

On Sun, Apr 14, 2024 at 11:53 PM David Steele <[email protected]> wrote:

Since incremental backup is using INCREMENTAL as a keyword (rather than
storing incremental info in the manifest) it is vulnerable to any file
in PGDATA with the pattern INCREMENTAL.*.


Yeah, that's true. I'm not greatly concerned about this, because I
think anyone who creates a file called INCREMENTAL.CONFIG is just

being perverse.

Well, it's INCREMENTAL.* and you might be surprised (though I doubt it)at what users will do. We've been caught out by wacky file names (anddatabase names) a few times.

However, we could ignore those files just in subtrees
where they're not expected to occur i.e. only pay attention to them
under base, global, and pg_tblspc. I didn't do this because I thought
someone might eventually want to do something like incremental backup
of SLRU files, and then it would be annoying if there were a bunch of
random pathname restrictions. But if we're really concerned about
this, I think it would be a reasonable fix; you're not really supposed
to decide to store your configuration files in directories that exist
for the purpose of storing relation data files.

I think it would be reasonable to restrict what can be put in base/ andglobal/ but users generally feel free to create whatever they want inthe root of PGDATA, despite being strongly encouraged not to.

Anyway, I think it should be fixed or documented as a caveat since itcauses a hard failure on restore.

We could fix the issue by forbidding this file pattern in PGDATA, i.e.
error when it is detected during pg_basebackup, but in my view it would
be better (at least eventually) to add incremental info to the manifest.
That would also allow us to skip storing zero-length files and
incremental stubs (with no changes) as files.


I did think about this, and I do like the idea of being able to elide
incremental stubs. If you have a tremendous number of relations and
very few of them have changed at all, the incremental stubs might take
up a significant percentage of the space consumed by the incremental
backup, which kind of sucks, even if the incremental backup is still
quite small compared to the original database. And, like you, I had
the idea of trying to use the backup_manifest to do it.

But ... I didn't really end up feeling very comfortable with it. Right
now, the backup manifest is something we only use to verify the
integrity of the backup. If we were to do this, it would become a

critical part of the backup.

For my 2c the manifest is absolutely a critical part of the backup. I'mhaving a hard time even seeing why we have the --no-manifest option forpg_combinebackup at all. If the manifest is missing who knows what elsemight be missing as well? If present, why wouldn't we use it?

I know Tomas added some optimizations that work best with --no-manifestbut if we can eventually read compressed tars (which I expect to be thegeneral case) then those optimizations are not very useful.

I don't think I like the idea that
removing the backup_manifest should be allowed to, in effect, corrupt
the backup. But I think I dislike even more the idea that the data
that is used to verify the backup gets mushed together with the backup
data itself. Maybe in practice it's fine, but it doesn't seem very
conceptually clean.

I don't think this is a problem. The manifest can do more than storeverification info, IMO.

There are also some practical considerations that made me not want to
go in this direction: we'd need to make sure that the relevant
information from the backup manifest was available to the
reconstruction process at the right part of the code. When iterating
over a directory, it would need to consider all of the files actually
present in that directory plus any "hallucinated" incremental stubs
from the manifest. I didn't feel confident of my ability to implement
that in the available time without messing anything up.

I think it would be better to iterate over the manifest than files in adirectory. In any case we still need to fix [1], which presents more orless the same problem.

I think we should consider other possible designs here. For example,
instead of including this in the manifest, we could ship one
INCREMENTAL.STUBS file per directory that contains a list of all of
the incremental stubs that should be created in that directory. My
guess is that something like this will turn out to be way simpler to
code.

So we'd store a mini manifest per directory rather than just put theinfo in the main manifest? Sounds like unnecessary complexity to me.


Regards,
-David

[1]https://www.postgresql.org/message-id/flat/9badd24d-5bd9-4c35-ba85-4c38a2feb73e%40pgmasters.net

Re: pg_combinebackup fails on file named INCREMENTAL.*

Reply via email to