On Sun, May 3, 2020 at 1:49 PM Andres Freund <and...@anarazel.de> wrote: > > > The run-to-run variations between the runs without cache control are > > > pretty large. So this is probably not the end-all-be-all numbers. But I > > > think the trends are pretty clear. > > > > Could you be explicit about what you think those clear trends are? > > Largely that concurrency can help a bit, but also hurt > tremendously. Below is some more detailed analysis, it'll be a bit > long...
OK, thanks. Let me see if I can summarize here. On the strength of previous experience, you'll probably tell me that some parts of this summary are wildly wrong or at least "not quite correct" but I'm going to try my best. - Server-side compression seems like it has the potential to be a significant win by stretching bandwidth. We likely need to do it with 10+ parallel threads, at least for stronger compressors, but these might be threads within a single PostgreSQL process rather than multiple separate backends. - Client-side cache management -- that is, use of posix_fadvise(DONTNEED), posix_fallocate, and sync_file_range, where available -- looks like it can improve write rates and CPU efficiency significantly. Larger block sizes show a win when used together with such techniques. - The benefits of multiple concurrent connections remain somewhat elusive. Peter Eisentraut hypothesized upthread that such an approach might be the most practical way forward for networks with a high bandwidth-delay product, and I hypothesized that such an approach might be beneficial when there are multiple tablespaces on independent disks, but we don't have clear experimental support for those propositions. Also, both your data and mine indicate that too much parallelism can lead to major regressions. - Any work we do while trying to make backup super-fast should also lend itself to super-fast restore, possibly including parallel restore. Compressed tarfiles don't permit random access to member files. Uncompressed tarfiles do, but software that works this way is not commonplace. The only mainstream archive format that seems to support random access seems to be zip. Adopting that wouldn't be crazy, but might limit our choice of compression options more than we'd like. A tar file of individually compressed files might be a plausible alternative, though there would probably be some hit to compression ratios for small files. Then again, if a single, highly-efficient process can handle a server-to-client backup, maybe the same is true for extracting a compressed tarfile... Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company