On Tue, Sep 7, 2021 at 1:28 PM Bossart, Nathan <bossa...@amazon.com> wrote: > Thanks for chiming in. The limit of 64 in the multiple-files-per- > directory-scan approach was mostly arbitrary. My earlier testing [0] > with different limits didn't reveal any significant difference, but > using a higher limit might yield a small improvement when there are > several hundred thousand .ready files. IMO increasing the limit isn't > really worth it for this approach. For 500,000 .ready files, > ordinarily you'd need 500,000 directory scans. When 64 files are > archived for each directory scan, you need ~8,000 directory scans. > With 128 files per directory scan, you need ~4,000. With 256, you > need ~2000. The difference between 8,000 directory scans and 500,000 > is quite significant. The difference between 2,000 and 8,000 isn't > nearly as significant in comparison.
That's certainly true. I guess what I don't understand about the multiple-files-per-dirctory scan implementation is what happens when something happens that would require the keep-trying-the-next-file approach to perform a forced scan. It seems to me that you still need to force an immediate full scan, because if the idea is that you want to, say, prioritize archiving of new timeline files over any others, a cached list of files that you should archive next doesn't accomplish that, just like keeping on trying the next file in sequence doesn't accomplish that. So I'm wondering if in the end the two approaches converge somewhat, so that with either patch you get (1) some kind of optimization to scan the directory less often, plus (2) some kind of notification mechanism to tell you when you need to avoid applying that optimization. If you wanted to, (1) could even include both batching and then, when the batch is exhausted, trying files in sequence. I'm not saying that's the way to go, but you could. In the end, it seems less important that we do any particular thing here and more important that we do something - but if prioritizing timeline history files is important, then we have to preserve that behavior. -- Robert Haas EDB: http://www.enterprisedb.com