Re: parallelizing the archiver

Robert Haas Fri, 10 Sep 2021 10:42:39 -0700

On Fri, Sep 10, 2021 at 1:07 PM Bossart, Nathan <bossa...@amazon.com> wrote:
> That being said, I think the discussion about batching is a good one
> to have.  If the overhead described in your SCP example is
> representative of a typical archive_command, then parallelism does
> seem a bit silly.


I think that's pretty realistic, because a lot of people's archive
commands are going to actually be, or need to use, scp specifically.
However, there are also cases where people are using commands that
just put the file in some local directory (maybe on a remote mount
point) and I would expect the startup overhead to be much less in
those cases. Maybe people are archiving via HTTPS or similar as well,
and then you again have some connection overhead though, I suspect,
not as much as scp, since web pages do not take 3 seconds to get an
https connection going. I don't know why scp is so crazy slow.

Even in the relatively low-overhead cases, though, I think we would
want to do some real testing to see if the benefits are as we expect.
See http://postgr.es/m/20200420211018.w2qphw4yybcbx...@alap3.anarazel.de
and downthread for context. I was *convinced* that parallel backup was
a win. Benchmarking was a tad underwhelming, but there was a clear if
modest benefit by running a synthetic test of copying a lot of files
serially or in parallel, with the files spread across multiple
filesystems on the same physical box. However, when Andres modified my
test program to use posix_fadvise(), posix_fallocate(), and
sync_file_range() while doing the copies, the benefits of parallelism
largely evaporated, and in fact in some cases enabling parallelism
caused major regressions. In other words, the apparent benefits of
parallelism were really due to suboptimal behaviors in the Linux page
cache and some NUMA effects that were in fact avoidable.

So I'm suspicious that the same things might end up being true here.
It's not exactly the same, because the goal of WAL archiving is to
keep up with the rate of WAL generation, and the goal of a backup is
(unless max-rate is used) to finish as fast as possible, and that
difference in goals might end up being significant. Also, you can make
an argument that some people will benefit from a parallelism feature
even if a perfectly-implemented archive_command doesn't, because many
people use really terrible archive_commnads. But all that said, I
think the parallel backup discussion is still a cautionary tale to
which some attention ought to be paid.

> We'd essentially be using a ton more resources when
> there's obvious room for improvement via reducing amount of overhead
> per archive.  I think we could easily make the batch size configurable
> so that existing archive commands would work (e.g.,
> archive_batch_size=1).  However, unlike the simple parallel approach,
> you'd likely have to adjust your archive_command if you wanted to make
> use of batching.  That doesn't seem terrible to me, though.  As
> discussed above, there are some implementation details to work out for
> archive failures, but nothing about that seems intractable to me.
> Plus, if you still wanted to parallelize things, feeding your
> archive_command several files at a time could still be helpful.

Yep.

> I'm currently leaning toward exploring the batching approach first.  I
> suppose we could always make a prototype of both solutions for
> comparison with some "typical" archive commands if that would help
> with the discussion.

Yeah, I think the concerns here are more pragmatic than philosophical,
at least for me.

I had kind of been thinking that the way to attack this problem is to
go straight to allowing for a background worker, because the other
problem with archive_command is that running a shell command like cp,
scp, or rsync is not really safe. It won't fsync your data, it might
not fail if the file is in the archive already, and it definitely
won't succeed without doing anything if there's a byte for byte
identical file in the archive and fail if there's a file with
different contents already in the archive. Fixing that stuff by
running different shell commands is hard, but it wouldn't be that hard
to do it in C code, and you could then also extend whatever code you
wrote to do batching and parallelism; starting more workers isn't
hard.

However, I can't see the idea of running a shell command going away
any time soon, in spite of its numerous and severe drawbacks. Such an
interface provides a huge degree of flexibility and allows system
admins to whack around behavior easily, which you don't get if you
have to code every change in C. So I think command-based enhancements
are fine to pursue also, even though I don't think it's the ideal
place for most users to end up.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: parallelizing the archiver

Reply via email to