On Thu, Feb 15, 2024 at 3:13 PM Andres Freund <and...@anarazel.de> wrote: > > This is why I don't think that the tuples with lower page offset > > numbers are in any way significant here. The significant part is > > whether or not you'll actually need to visit more than one leaf page > > in the first place (plus the penalty from not being able to reorder > > the work across page boundaries in your initial v1 of prefetching). > > To me this your phrasing just seems to reformulate the issue.
What I said to Tomas seems very obvious to me. I think that there might have been some kind of miscommunication (not a real disagreement). I was just trying to work through that. > In practical terms you'll have to wait for the full IO latency when fetching > the table tuple corresponding to the first tid on a leaf page. Of course > that's also the moment you had to visit another leaf page. Whether the stall > is due to visit another leaf page or due to processing the first entry on such > a leaf page is a distinction without a difference. I don't think anybody said otherwise? > > > That's certainly true / helpful, and it makes the "first entry" issue > > > much less common. But the issue is still there. Of course, this says > > > nothing about the importance of the issue - the impact may easily be so > > > small it's not worth worrying about. > > > > Right. And I want to be clear: I'm really *not* sure how much it > > matters. I just doubt that it's worth worrying about in v1 -- time > > grows short. Although I agree that we should commit a v1 that leaves > > the door open to improving matters in this area in v2. > > I somewhat doubt that it's realistic to aim for 17 at this point. That's a fair point. Tomas? > We seem to > still be doing fairly fundamental architectual work. I think it might be the > right thing even for 18 to go for the simpler only-a-single-leaf-page > approach though. I definitely think it's a good idea to have that as a fall back option. And to not commit ourselves to having something better than that for v1 (though we probably should commit to making that possible in v2). > I wonder if there are prerequisites that can be tackled for 17. One idea is to > work on infrastructure to provide executor nodes with information about the > number of tuples likely to be fetched - I suspect we'll trigger regressions > without that in place. I don't think that there'll be regressions if we just take the simpler only-a-single-leaf-page approach. At least it seems much less likely. > One way to *sometimes* process more than a single leaf page, without having to > redesign kill_prior_tuple, would be to use the visibilitymap to check if the > target pages are all-visible. If all the table pages on a leaf page are > all-visible, we know that we don't need to kill index entries, and thus can > move on to the next leaf page It's possible that we'll need a variety of different strategies. nbtree already has two such strategies in _bt_killitems(), in a way. Though its "Modified while not pinned means hinting is not safe" path (LSN doesn't match canary value path) seems pretty naive. The prefetching stuff might present us with a good opportunity to replace that with something fundamentally better. -- Peter Geoghegan