Re: WIP/PoC for parallel backup

Robert Haas Fri, 12 Jun 2020 10:29:33 -0700

On Thu, Jun 11, 2020 at 1:41 PM Hamid Akhtar <hamid.akh...@gmail.com> wrote:
> As far I understand, parallel backup is not a mandatory performance feature, 
> rather, one at user's discretion. This IMHO indicates that it will benefit 
> some users and it may not others.
>
> IMHO, parallel backup has obvious performance benefits, but we need to ensure 
> that users understand that there is potential for slower backup if there is 
> no competition for resources.


I am sure that nobody is arguing that the patch has to be beneficial
in all cases in order to justify applying it. However, there are
several good arguments against proceding with this patch:

* Every version of the patch that has been reviewed by anybody has
been riddled with errors. Over and over again, testers have found
serious bugs, and code reviewers have noticed lots of problems, too.

* This approach requires rewriting a lot of current functionality,
either by moving it to the client side or by restructuring it to work
with parallelism. That's a lot of work, and it seems likely to
generate more work in the future as people continue to add features.
It's one thing to add a feature that doesn't benefit everybody; it's
another thing to add a feature that doesn't benefit everybody and also
hinders future development. See
http://postgr.es/m/ca+tgmozublxyr+pd_gi3mvgyv5hqdlm-gbrvxkun-lewaw1...@mail.gmail.com
for more discussion of these issues.

* The scenarios in which the patch delivers a performance benefit are
narrow and somewhat contrived. In remote backup scenarios, AIUI, the
patch hasn't been shown to help. In local backups, it does, but how
likely is it that you are going to do your local backups over the wire
protocol instead of by direct file copy, which is probably much
faster? I agree that if your server is overloaded, having multiple
processes competing for the server resources will allow backup to get
a larger slice relative to other things, but that seems like a pretty
hackish and inefficient solution to that problem. You could also argue
that we could provide a feature to prioritize some queries over other
queries by running them with tons of parallel workers just to convince
the OS to give them more resources, and I guess that would work, but
it would also waste tons of resources and possibly cripple or even
crash your system if you used it enough. The same argument applies
here.

* Even when the patch does provide a benefit, it seems to max out at
about 2.5X. Clearly it's nice to have something go 2.5X faster, but
the point is that it doesn't scale beyond that no matter how many
workers you add. That doesn't automatically mean that something is a
bad idea, but it is a concern. At the very least, we should be able to
say why it doesn't scale any better than that.

* Actually, we have some hints about that. Over at
http://postgr.es/m/20200503174922.mfzzdafa5g4rl...@alap3.anarazel.de
Andres has shown that too much concurrency when copying files results
in a dramatic performance reduction, and that a lot of the reason why
concurrency helps in the first place has to do with the fact that
pg_basebackup does not have any cache control (no fallocate,
sync_file_range(WRITE), posix_fadvise(DONTNEED)). When those things
are added the performance gets better and the benefits of concurrency
are reduced. I suspect that would also be true for this patch. It
would be unreasonable to commit a large patch, especially one that
would hinder future development, if we could get the same benefits
from a small patch that would not do so.

I am not in a position to tell you how to spend your time, so you can
certainly pursue this patch if you wish. However, I think it's
probably not the best use of time. Even if you fixed all the bugs and
reimplemented all of the functionality that needs reimplementing in
order to make this approach work, it still doesn't make sense to
commit the patch if either (a) we can obtain the same benefit, or most
of it, from a much simpler patch or (b) the patch is going to make it
significantly harder to develop other features that we want to have,
especially if those features seem likely to be more beneficial than
what this patch offers. I think both of those are likely true here.

For an example of (b), consider compression of tar files on the server
side before transmission to the client. If you take the approach this
patch does and move tarfile construction to the client, that is
impossible. Now you can argue (and perhaps you will) that this would
just mean someone has to choose between using this feature and using
that feature, and why should users not have such a choice? That is a
fair argument, but my counter-argument is that users shouldn't be
forced into making that choice. If the parallel feature is beneficial
enough to justify having it, then it ought to be designed in such a
way that it works with the other features we also want to have rather
than forcing users to choose between them. Since I have already
proposed (on the other thread linked above) a design that would make
that possible, and this design does not, I have a hard time
understanding why we would pick this one, especially given all of the
other disadvantages which it seems to have.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: WIP/PoC for parallel backup

Reply via email to