On Mon, Apr 20, 2020 at 4:19 PM Andres Freund <and...@anarazel.de> wrote: > Why do we want parallelism here. Or to be more precise: What do we hope > to accelerate by making what part of creating a base backup > parallel. There's several potential bottlenecks, and I think it's > important to know the design priorities to evaluate a potential design. > > Bottlenecks (not ordered by importance): > - compression performance (likely best solved by multiple compression > threads and a better compression algorithm) > - unencrypted network performance (I'd like to see benchmarks showing in > which cases multiple TCP streams help / at which bandwidth it starts > to help) > - encrypted network performance, i.e. SSL overhead (not sure this is an > important problem on modern hardware, given hardware accelerated AES) > - checksumming overhead (a serious problem for cryptographic checksums, > but presumably not for others) > - file IO (presumably multiple facets here, number of concurrent > in-flight IOs, kernel page cache overhead when reading TBs of data) > > I'm not really convinced that design addressing the more crucial > bottlenecks really needs multiple fe/be connections. But that seems to > be have been the focus of the discussion so far.
I haven't evaluated this. Both BART and pgBackRest offer parallel backup options, and I'm pretty sure both were performance tested and found to be very significantly faster, but I didn't write the code for either, nor have I evaluated either to figure out exactly why it was faster. My suspicion is that it has mostly to do with adequately utilizing the hardware resources on the server side. If you are network-constrained, adding more connections won't help, unless there's something shaping the traffic which can be gamed by having multiple connections. However, as things stand today, at any given point in time the base backup code on the server will EITHER be attempting a single filesystem I/O or a single network I/O, and likewise for the client. If a backup client - either current or hypothetical - is compressing and encrypting, then it doesn't have either a filesystem I/O or a network I/O in progress while it's doing so. You take not only the hit of the time required for compression and/or encryption, but also use that much less of the available network and/or I/O capacity. While I agree that some of these problems could likely be addressed in other ways, parallelism seems to offer an approach that could solve multiple issues at the same time. If you want to address it without that, you need asynchronous filesystem I/O and asynchronous network I/O and both of those on both the client and server side, plus multithreaded compression and multithreaded encryption and maybe some other things. That sounds pretty hairy and hard to get right. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company