On Sun, Feb 19, 2023 at 2:45 AM Andres Freund <and...@anarazel.de> wrote: > To me that seems even simpler? Nothing but the archiver is supposed to create > .done files and nothing is supposed to remove .ready files without archiver > having created the .done files. So the archiver process can scan > archive_status until its done or until N archives have been collected, and > then process them at once? Only the creation of the .done files would be > serial, but I don't think that's commonly a problem (and could be optimized as > well, by creating multiple files and then fsyncing them in a second pass, > avoiding N filesystem journal flushes). > > Maybe I am misunderstanding what you see as the problem?
Well right now the archiver process calls ArchiveFileCB when there's a file ready for archiving, and that process is supposed to archive the whole thing before returning. That pretty obviously seems to preclude having more than one file being archived at the same time. What callback structure do you have in mind to allow for that? I mean, my idea was to basically just have one big callback: ArchiverModuleMainLoopCB(). Which wouldn't return, or perhaps, would only return when archiving was totally caught up and there was nothing more to do right now. And then that callback could call functions like AreThereAnyMoreFilesIShouldBeArchivingAndIfYesWhatIsTheNextOne(). So it would call that function and it would find out about a file and start an HTTP session or whatever and then call that function again and start another HTTP session for the second file and so on until it had as much concurrency as it wanted. And then when it hit the concurrency limit, it would wait until at least one HTTP request finished. At that point it would call HeyEverybodyISuccessfullyArchivedAWalFile(), after which it could again ask for the next file and start a request for that one and so on and so forth. I don't really understand what the other possible model is here, honestly. Right now, control remains within the archive module for the entire time that a file is being archived. If we generalize the model to allow multiple files to be in the process of being archived at the same time, the archive module is going to need to have control as long as >= 1 of them are in progress, at least AFAICS. If you have some other idea how it would work, please explain it to me... -- Robert Haas EDB: http://www.enterprisedb.com