On 4/12/20 3:17 PM, Andres Freund wrote:
More generally, can you think of any ideas for how to structure an API
here that are easier to use than "write some C code"? Or do you think
we should tell people to write some C code if they want to
compress/encrypt/relocate their backup in some non-standard way?
For the record, I'm not against eventually having more than one way to
do this, maybe a shell-script interface for simpler things and some
kind of API for more complex needs (e.g. NetBackup integration,
perhaps). And I did wonder if there was some other way we could do
this.
I'm doubtful that an API based on string replacement is the way to
go. It's hard for me to see how that's not either going to substantially
restrict the way the "tasks" are done, or yield a very complicated
interface.
I wonder whether the best approach here could be that pg_basebackup (and
perhaps other tools) opens pipes to/from a subcommand and over the pipe
it communicates with the subtask using a textual ([2]) description of
tasks. Like:
backup mode=files base_directory=/path/to/data/directory
backup_file name=base/14037/16396.14 size=1073741824
backup_file name=pg_wal/XXXX size=16777216
or
backup mode=tar
base_directory /path/to/data/
backup_tar name=dir.tar size=983498875687487
This is pretty much what pgBackRest does. We call them "local" processes
and they do most of the work during backup/restore/archive-get/archive-push.
The obvious problem with that proposal is that we don't want to
unnecessarily store the incoming data on the system pg_basebackup is
running on, just for the subcommand to get access to them. More on that
in a second.
We also implement "remote" processes so the local processes can get data
that doesn't happen to be local, i.e. on a remote PostgreSQL cluster.
A huge advantage of a scheme like this would be that it wouldn't have to
be specific to pg_basebackup. It could just as well work directly on the
server, avoiding an unnecesary loop through the network. Which
e.g. could integrate with filesystem snapshots etc. Without needing to
build the 'archive target' once with server libraries, and once with
client libraries.
Yes -- needing to store the data locally or stream it through one main
process is a major bottleneck.
Working on the server is key because it allows you to compress before
transferring the data. With parallel processing it is trivial to flood a
network. We have a recent example from a community user of backing up
25TB in 4 hours. Compression on the server makes this possible (and a
fast network, in this case).
For security reasons, it's also nice to be able to encrypt data before
it leaves the database server. Calculating checksums/size at the source
is also ideal.
One reason I think something like this could be advantageous over a C
API is that it's quite feasible to implement it from a number of
different language, including shell if really desired, without needing
to provide a C API via a FFI.
We migrated from Perl to C and kept our local/remote protocol the same,
which really helped. So, we had times when the C code was using a Perl
local/remote and vice versa. The idea is certainly workable in our
experience.
It'd also make it quite natural to split out compression from
pg_basebackup's main process, which IME currently makes it not really
feasible to use pg_basebackup's compression.
This is a major advantage.
There's various ways we could address the issue for how the subcommand
can access the file data. The most flexible probably would be to rely on
exchanging file descriptors between basebackup and the subprocess (these
days all supported platforms have that, I think). Alternatively we
could invoke the subcommand before really starting the backup, and ask
how many files it'd like to receive in parallel, and restart the
subcommand with that number of file descriptors open.
We don't exchange FDs. Each local is responsible for getting the data
from PostgreSQL or the repo based on knowing the data source and a path.
For pg_basebackup, however, I'd imagine each local would want a
replication connection with the ability to request specific files that
were passed to it by the main process.
[2] yes, I already hear json. A line deliminated format would have some
advantages though.
We use JSON, but each protocol request/response is linefeed-delimited.
So for example here's what it looks like when the main process requests
a local process to backup a specific file:
{"{"cmd":"backupFile","param":["base/32768/33001",true,65536,null,true,0,"pg_data/base/32768/33001",false,0,3,"20200412-213313F",false,null]}"}
And the local responds with:
{"{"out":[1,65536,65536,"6bf316f11d28c28914ea9be92c00de9bea6d9a6b",{"align":true,"error":[0,[3,5],7],"valid":false}]}"}
We use arrays for parameters but of course these could be done with
objects for more readability.
We are considering a move to HTTP since lots of services (e.g. S3, GCS,
Azure, etc.) require it (so we implement it) and we're not sure it makes
sense to maintain our own protocol format. That said, we'd still prefer
to use JSON for our payloads (like GCS) rather than XML (as S3 does).
Regards,
--
-David
da...@pgmasters.net