On 4/12/24 22:40, Tomas Vondra wrote:
On 4/12/24 11:50, David Steele wrote:
On 4/12/24 19:09, Magnus Hagander wrote:
On Fri, Apr 12, 2024 at 12:14 AM David Steele <da...@pgmasters.net
...>>
> But yeah, having to keep the backups as expanded directories is
not
> great, I'd love to have .tar. Not necessarily because of the disk
space
> (in my experience the compression in filesystems works quite
well for
> this purpose), but mostly because it's more compact and allows
working
> with backups as a single piece of data (e.g. it's much cleared
what the
> checksum of a single .tar is, compared to a directory).
But again, object stores are commonly used for backup these days and
billing is based on data stored rather than any compression that
can be
done on the data. Of course, you'd want to store the compressed
tars in
the object store, but that does mean storing an expanded copy
somewhere
to do pg_combinebackup.
Object stores are definitely getting more common. I wish they were
getting a lot more common than they actually are, because they
simplify a lot. But they're in my experience still very far from
being a majority.
I see it the other way, especially the last few years. The majority seem
to be object stores followed up closely by NFS. Directly mounted storage
on the backup host appears to be rarer.
One thing I'd mention is that not having built-in support for .tar and
.tgz backups does not mean it's impossible to use pg_combinebackup with
archives. You can mount them using e.g. "ratarmount" and then use that
as source directories for pg_combinebackup.
It's not entirely friction-less because AFAICS it's necessary to do the
backup in plain format and then do the .tar to have the expected "flat"
directory structure (and not manifest + 2x tar). But other than that it
seems to work fine (based on my limited testing).
Well, that's certainly convoluted and doesn't really help a lot in terms
of space consumption, it just shifts the additional space required to
the backup side. I doubt this is something we'd be willing to add to our
documentation so it would be up to the user to figure out and script.
FWIW the "archivemount" performs terribly, so adding this capability
into pg_combinebackup is clearly far from trivial.
I imagine this would perform pretty badly. And yes, doing it efficiently
is not trivial but certainly doable. Scanning the tar file and matching
to entries in the manifest is one way, but I would prefer to store the
offsets into the tar file in the manifest then assemble an ordered list
of work to do on each tar file. But of course the latter requires a
manifest-centric approach, which is not what we have right now.
Regards,
-David