On Tue, Apr 22, 2025 at 2:34 PM Tomas Vondra <to...@vondra.me> wrote: > Yeah, that makes sense, although I've been thinking about this a bit > differently. I haven't been trying to establish a new "component" to > manage prefetching. For me the question was what's the right layer, so > that unnecessary details don't leak into AM and/or executor.
FWIW that basically seems equivalent to what I said. If there's any difference at all between what each of us has said, then it's only a difference in emphasis. The "index scan manager" doesn't just manage prefetching -- it manages the whole index scan, including details that were previously only supposed to be known inside index AMs. It can do so while weighing all relevant factors -- regardless of whether they're related to the index structure or the heap structure. It would be possible to (say) do everything at the index AM level instead. But then we'd be teaching index AMs about heap/table AM related costs, which would be a bad design, primarily because it would have to duplicate the same logic in every supported index AM. Better to have one dedicated layer that has an abstract-ish understanding of both index AM scan costs, and table AM scan costs. It needs to be abstract, but not too abstract -- costs like "read one index leaf page" generalize well across all index AMs. And costs like "read one table AM page" should also generalize quite well, at least across block-based table AMs. You primarily care about "doing the layering right", while I primarily care about "making sure that one layer can see all relevant costs". ISTM that these are two sides of the same coin. > It requires exchanging some additional details with the AM, provided by > the new callbacks. I think of it as primarily externalizing decisions about index page accesses. The index AM reads the next leaf page to be read because the index scan manager tells it to. The index AM performs killitems exactly as instructed by the index scan manager. And the index AM doesn't really own as much context about the progress of the scan -- that all lives inside the scan manager instead. The scan manager has a fairly fuzzy idea about how the index AM organizes data, but that shouldn't matter. > It seems the indexam.c achieves both your and mine goals, more or less. Agreed. > Yes. I wonder if we should introduce a separate abstraction for this, as > a subset of indexam.c. I like that idea. > My argument was (a) ability to disable prefetching, and fall back to the > old code if needed, and (b) handling use cases where prefetching does > not work / is not implemented, even if only temporarily (e.g. ordered > scan in SP-GiST). Maybe (a) is unnecessarily defensive, and (b) may not > be needed. Not sure. We don't need to make a decision on this for some time, but I still lean towards forcing index AMs to make a choice between this new interface, and the old amgettuple interface. > > I don't see why prefetching should be mandatory with this new > > interface. Surely it has to have adaptive "ramp-up" behavior already, > > even when we're pretty sure that prefetching is a good idea from the > > start? > > > > Possibly, I may be too defensive. And perhaps in cases where we know the > prefetching can't help we could disable that for the read_stream. Shouldn't the index scan manager be figuring all this out for us, automatically? Maybe that works in a very trivial way, at first. The important point is that the design be able to support these requirements in some later iteration of the feature -- though it's unlikely to happen in the first Postgres version that the scan manager thing appears in. > I think hash should be fairly easy to support. But I was really thinking > about doing SP-GiST, exactly because it's very different in some > aspects, and I wanted to validate the design on that (for hash I think > it's almost certain it's OK). WFM. There are still bugs in SP-GiST (and GiST) index-only scans: https://www.postgresql.org/message-id/cah2-wz%3dpqoziyrsrnn5jatfxwxy7-bjchz9s355lh8dt%3d5q...@mail.gmail.com It would be nice if the new index scan manager interface could fix that bug, at least in the case of SP-GiST. By generalizing the approach that nbtree takes, where we hang onto a leaf buffer pin. Admittedly this would necessitate changes to SP-GiST VACUUM, which doesn't cleanup lock any pages, but really has to in order to fix the underlying bug. There are draft patches that try to fix the bug, which might be a useful starting point. > I think the cases affected by this the most are index-only scans on > all-visible tables that fit into shared buffers, with > correlated/sequential pattern. Or even regular index scans with all data > in shred buffers. My hope is that the index scan manager can be taught to back off when this is happening, to avoid the regressions. Or that it can avoid them by only gradually ramping up the prefetching. Does that sound plausible to you? -- Peter Geoghegan