On 9/7/21, 10:54 AM, "Robert Haas" <robertmh...@gmail.com> wrote: > I guess what I don't understand about the multiple-files-per-dirctory > scan implementation is what happens when something happens that would > require the keep-trying-the-next-file approach to perform a forced > scan. It seems to me that you still need to force an immediate full > scan, because if the idea is that you want to, say, prioritize > archiving of new timeline files over any others, a cached list of > files that you should archive next doesn't accomplish that, just like > keeping on trying the next file in sequence doesn't accomplish that.
Right. The latest patch for that approach [0] does just that. In fact, I think timeline files are the only files for which we need to force an immediate directory scan in the multiple-files-per-scan approach. For the keep-trying-the-next-file approach, we have to force a directory scan for anything but a regular WAL file that is ahead of our archiver state. > So I'm wondering if in the end the two approaches converge somewhat, > so that with either patch you get (1) some kind of optimization to > scan the directory less often, plus (2) some kind of notification > mechanism to tell you when you need to avoid applying that > optimization. If you wanted to, (1) could even include both batching > and then, when the batch is exhausted, trying files in sequence. I'm > not saying that's the way to go, but you could. In the end, it seems > less important that we do any particular thing here and more important > that we do something - but if prioritizing timeline history files is > important, then we have to preserve that behavior. Yeah, I would agree that the approaches basically converge into some form of "do fewer directory scans." Nathan [0] https://www.postgresql.org/message-id/attachment/125980/0001-Improve-performance-of-pgarch_readyXlog-with-many-st.patch