Re: parallelizing the archiver

Julien Rouhaud Fri, 10 Sep 2021 07:19:56 -0700

On Fri, Sep 10, 2021 at 9:13 PM Robert Haas <robertmh...@gmail.com> wrote:
>
> To me, it seems way more beneficial to think about being able to
> invoke archive_command with many files at a time instead of just one.
> I think for most plausible archive commands that would be way more
> efficient than what you propose here. It's *possible* that if we had
> that, we'd still want this, but I'm not even convinced.


Those approaches don't really seems mutually exclusive?  In both case
you will need to internally track the status of each WAL file and
handle non contiguous file sequences.  In case of parallel commands
you only need additional knowledge that some commands is already
working on a file.  Wouldn't it be even better to eventually be able
launch multiple batches of multiple files rather than a single batch?

If we start with parallelism first, the whole ecosystem could
immediately benefit from it as is.  To be able to handle multiple
files in a single command, we would need some way to let the server
know which files were successfully archived and which files weren't,
so it requires a different communication approach than the command
return code.

But as I said, I'm not convinced that using the archive_command
approach for that is the best approach  If I understand correctly,
most of the backup solutions would prefer to have a daemon being
launched and use it at a queuing system.  Wouldn't it be better to
have a new archive_mode, e.g. "daemon", and have postgres responsible
to (re)start it, and pass information through the daemon's
stdin/stdout or something like that?

Re: parallelizing the archiver

Reply via email to