On Tue, Dec 17, 2024 at 1:46 PM Tomas Vondra <to...@vondra.me> wrote: > > On 12/17/24 18:06, Melanie Plageman wrote: > > On Tue, Dec 17, 2024 at 9:11 AM Tomas Vondra <to...@vondra.me> wrote: > >> > >> > >> > >> On 12/16/24 19:49, Melanie Plageman wrote: > >> > >>> No, I'm talking about the behavior of causing small pockets of > >>> all-frozen pages which end up being smaller than SKIP_PAGES_THRESHOLD > >>> and are then scanned (even though they are already frozen). What I > >>> describe in that email I cited is that because we freeze > >>> opportunistically when we have or will emit an FPI, and bgwriter will > >>> write out blocks in clocksweep order, we end up with random pockets of > >>> pages getting frozen during/after a checkpoint. Then in the next > >>> vacuum, we end up scanning those all-frozen pages again because the > >>> ranges of frozen pages are smaller than SKIP_PAGES_THRESHOLD. This is > >>> mostly going to happen for an insert-only workload. I'm not saying > >>> freezing the pages is bad, I'm saying that causing these pockets of > >>> frozen pages leads to scanning all-frozen pages on future vacuums. > >>> > >> > >> Yeah, this interaction between the components is not great :-( But can > >> we think of a way to reduce the fragmentation? What would need to change? > > > > Well reducing SKIP_PAGES_THRESHOLD would help. > > How does SKIP_PAGES_THRESHOLD change the fragmentation? I think that's > the side that's affected by the fragmentation, but it's really due to > the eager freezing / bgwriter evictions, etc. If the threshold is set to > 1 (i.e. to always skip), that just lowers the impact, but the relation > is still as fragmented as before, no?
Yep, exactly. It doesn't help with fragmentation. It just helps us not scan all-frozen pages. The question is whether or not the fragmentation on its own matters. I think it would be better if we didn't have it -- we could potentially do larger reads, for example, if we have one continuous block of pages that are not all-frozen (most likely when using the read stream API). > > And unfortunately we do > > not know if the skippable pages are all-frozen without extra > > visibilitymap_get_status() calls -- so we can't decide to avoid > > scanning ranges of skippable pages because they are frozen. > > > > I may be missing something, but doesn't find_next_unskippable_block() > already get those bits? In fact, it even checks VISIBILITYMAP_ALL_FROZEN > but only for aggressive vacuum. But even if that wasn't the case, isn't > checking the VM likely much cheaper than vacuuming the heap page? find_next_unskippable_block() has them, but then if we do decide to skip a range of pages, back in heap_vac_scan_next_block() where we decide whether or not to skip a range using SKIP_PAGES_THRESHOLD, we know that the pages in the range are all-visible (otherwise they wouldn't be skippable) but we no longer know which of them were all-frozen. > >> Maybe the freezing code could check how many of the nearby pages are > >> frozen, and consider that together with the FPI write? > > > > That's an interesting idea. We wouldn't have any guaranteed info > > because we only have a lock on the page we are considering freezing. > > But we could keep track of the length of a run of pages we are > > freezing and opportunistically freeze pages that don't require > > freezing if they follow one or more pages requiring freezing. > > I don't think we need a "guaranteed" information - a heuristics that's > correct most of the time (say, >90%?) ought to be good enough. I mean, > it has to be, because we'll never get a rule that's correct 100%. So > even just looking at a batch of pages in VM should be enough, no? > > > But I don't know how much more this buys us than removing > > SKIP_PAGES_THRESHOLD. Since it would "fix" the fragmentation, perhaps > > it makes larger future vacuum reads possible. But I wonder how much > > benefit it would be vs complexity. > > > > I think that depends on which cost we're talking about. If we only talk > about the efficiency of a single vacuum, then it probably does not help > very much. I mean, if we assume the relation is already fragmented, then > it seems to be cheaper to vacuum just the pages that need it (as if with > SKIP_PAGES_THRESHOLD=1). > > But if we're talking about long-time benefits, in reducing the amount of > freezing needed overall, maybe it'd be a win? I don't know. Yea, it just depends on whether or not the pages we freeze for this reason are likely to stay frozen. I think I misspoke in saying we want to freeze pages next to pages requiring freezing. What we really want to do is freeze pages next to pages that are being opportunistically frozen -- because those are the ones that are creating the fragmentation. But, then where do you draw the line? You won't know if you are creating lots of random holes until after you've skipped opportunistically freezing some pages -- and by then it's too late. - Melanie