On Sun, Aug 22, 2021 at 10:31 PM Bossart, Nathan <bossa...@amazon.com> wrote: > I ran this again on a bigger machine with 200K WAL files pending > archive. The v9 patch took ~5.5 minutes, the patch I sent took ~8 > minutes, and the existing logic took just under 3 hours.
Hmm. On the one hand, 8 minutes > 5.5 minutes, and presumably the gap would only get wider if the number of files were larger or if reading the directory were slower. I am pretty sure that reading the directory must be much slower in some real deployments where this problem has come up. On the other hand, 8.8 minutes << 3 hours, and your patch would win if somehow we had a ton of gaps in the sequence of files. I'm not sure how likely that is to be the cause - probably not very likely at all if you aren't using an archive command that cheats, but maybe really common if you are. Hmm, but I think if the archive_command cheats by marking a bunch of files done when it is tasked with archiving just one, your patch will break, because, unless I'm missing something, it doesn't re-evaluate whether things have changed on every pass through the loop as Dipesh's patch does. So I guess I'm not quite sure I understand why you think this might be the way to go? Maintaining the binary heap in lowest-priority-first order is very clever, and the patch does look quite elegant. I'm just not sure I understand the point. -- Robert Haas EDB: http://www.enterprisedb.com