On Mon, Mar 21, 2022 at 12:58 AM Michail Nikolaev <michail.nikol...@gmail.com> wrote: > Hm, not sure here > AFAIK current implementation does not produce repeated FPIs. Page is > marked as dirty on the first bit. So, others LP_DEAD (if not set by > single scan) do not generate FPI until checkpoint is ready.
There is one FPI per checkpoint for any leaf page that is modified during that checkpoint. The difference between having that happen once or twice per leaf page and having that happen many more times per leaf page could be very large. Of course it's true that that might not make that much difference. Who knows? But if you're not willing to measure it then we'll never know. What version are you using here? How frequently were checkpoints occurring in the period in question, and how does that compare to normal? You didn't even include this basic information. Many things have changed in this area already, and it's rather unclear how much just upgrading to Postgres 14 would help. I think that it's possible that it would help you here a great deal. I also think it's possible that it wouldn't help at all. I don't know which it is, and I wouldn't expect to know without careful testing -- it's too complicated, and likely would be even if all of the information about the application is available. The main reason that this can be so complex is that FPIs are caused by more frequent checkpoints, but *also* cause more frequent checkpoints in turn. So you could have a "death spiral" with FPIs -- the effect is nonlinear, which has the potential to lead to pathological, chaotic behavior. The impact on response time is *also* nonlinear and chaotic, in turn. Sometimes it's possible to address things like this quite well with relatively simple solutions, that at least work well in most cases -- just avoiding getting into a "death spiral" might be all it takes. As I said, maybe that won't be possible here, but it should be carefully considered first. Not setting LP_DEAD bits because there are currently "too many FPIs" requires defining what that actually means, which seems very difficult because of these nonlinear dynamics. What do you do when there were too many FPIs for a long time, but also too much avoiding them earlier on? It's very complicated. That's why I'm emphasizing solutions that focus on limiting the downside of not setting LP_DEAD bits, which is local information (not system wide information) that is much easier to understand and target in the implementation. -- Peter Geoghegan