On 9/10/21, 8:22 AM, "Robert Haas" <robertmh...@gmail.com> wrote: > On Fri, Sep 10, 2021 at 10:19 AM Julien Rouhaud <rjuju...@gmail.com> wrote: >> Those approaches don't really seems mutually exclusive? In both case >> you will need to internally track the status of each WAL file and >> handle non contiguous file sequences. In case of parallel commands >> you only need additional knowledge that some commands is already >> working on a file. Wouldn't it be even better to eventually be able >> launch multiple batches of multiple files rather than a single batch? > > Well, I guess I'm not convinced. Perhaps people with more knowledge of > this than I may already know why it's beneficial, but in my experience > commands like 'cp' and 'scp' are usually limited by the speed of I/O, > not the fact that you only have one of them running at once. Running > several at once, again in my experience, is typically not much faster. > On the other hand, scp has a LOT of startup overhead, so it's easy to > see the benefits of batching. > > [...] > >> If we start with parallelism first, the whole ecosystem could >> immediately benefit from it as is. To be able to handle multiple >> files in a single command, we would need some way to let the server >> know which files were successfully archived and which files weren't, >> so it requires a different communication approach than the command >> return code. > > That is possibly true. I think it might work to just assume that you > have to retry everything if it exits non-zero, but that requires the > archive command to be smart enough to do something sensible if an > identical file is already present in the archive.
My initial thinking was similar to Julien's. Assuming I have an archive_command that handles one file, I can just set archive_max_workers to 3 and reap the benefits. If I'm using an existing utility that implements its own parallelism, I can keep archive_max_workers at 1 and continue using it. This would be a simple incremental improvement. That being said, I think the discussion about batching is a good one to have. If the overhead described in your SCP example is representative of a typical archive_command, then parallelism does seem a bit silly. We'd essentially be using a ton more resources when there's obvious room for improvement via reducing amount of overhead per archive. I think we could easily make the batch size configurable so that existing archive commands would work (e.g., archive_batch_size=1). However, unlike the simple parallel approach, you'd likely have to adjust your archive_command if you wanted to make use of batching. That doesn't seem terrible to me, though. As discussed above, there are some implementation details to work out for archive failures, but nothing about that seems intractable to me. Plus, if you still wanted to parallelize things, feeding your archive_command several files at a time could still be helpful. I'm currently leaning toward exploring the batching approach first. I suppose we could always make a prototype of both solutions for comparison with some "typical" archive commands if that would help with the discussion. Nathan