On Tue, Dec 9, 2025 at 6:08 PM Andres Freund <[email protected]> wrote: > On 2025-12-09 16:40:17 -0500, Robert Haas wrote: > > On Fri, Aug 29, 2025 at 4:38 AM Julien Tachoires <[email protected]> > > wrote: > > Potentially, there could be a performance problem > > I think the big performance hazard with this is repeated deforming. The > scankey infrastructure deforms attributes one-by-one *and* it does not > "persist" the work of deforming for later accesses. So if you e.g. have > something like > > SELECT sum(col_29) FROM tbl WHERE col_30 = common_value; > or > SELECT * FROM tbl WHERE col_30 = common_value; > > we'll now deform col_30 in isolation for the ScanKey evaluation and then we'll > deform columns 1-29 in the slot (because we always deform all the leading > columns), during projection.
Hmm, this is a good point, and I agree that it's a huge challenge for this patch set. Repeated tuple deforming is REALLY expensive, which is why we've spent so much energy trying to use slots in an as many places as possible. I find it easy to believe that HeapKeyTest's loop over heap_getattr() is going to prohibitively painful and that this code will need to somehow also be slot-ified for this to be a viable project. > I don't really see this being viable without first tackling two nontrivial > projects: > > 1) Make slot deforming for expressions & projections selective, i.e. don't > deform all the leading columns, but only ones that will eventually be > needed > 2) Perform ScanKey evaluation in slot form, to be able to cache the deforming > and to make deforming of multiple columns sufficiently efficient. IOW, I agree that we probably need to do #2. I am not entirely sure about #1. I'm a little afraid that trying to skip over columns without deforming them will add a bunch of code complexity that doesn't really pay off. You have to do the bookkeeping to know what to skip, and then how much are you really gaining by skipping it? If you can skip over a bunch of fixed-width columns, that's cool, but it's probably fairly normal to have lots of varlena columns, and then I don't really see that you're gaining much here. You still have to iterate through the tuple, and not storing the pointer to the start of each column as you find it doesn't seem like it will save much. What's your reasoning behind thinking that #1 will be necessary? > > So, somewhat to my surprise, I think that v4-0001 might be basically > > fine. I wonder if anyone else sees a problem that I'm missing? > > I doubt this would be safe as-is: ISTM that if you release the page lock > between tuples, things like the number of items on the page can change. But we > store stuff like that in registers / on the stack, which could change while > the lock is not held. > > We could refetch the number items on the page for every loop iteration, but > that'd probably not be free. OTOH, it's probably nothing compared to the cost > of relocking the page... We still hold a pin, though, which I think means very little can change. More items can be added to the page, so we might want to refresh the number of items on the page at least when we think we're done, but I believe that any sort of more invasive page rearrangement would be precluded by the pin. I kind of wonder if it would be good to make a change along the lines of v4-0001 even if this patch set doesn't move forward overall, or will need a lot of slot-ification to do so. It seems weird to me that we're OK with calling out to arbitrary code with a buffer lock held, and even weirder that whether or not we do that depends on whether SO_ALLOW_PAGEMODE was set. I don't think a difference of this kind between pagemode behavior and non-pagemode behavior would survive review if someone proposed it today; the fact that it works the way it does is probably an artifact of this mechanism having been added twenty years ago when the project was in a very different place. -- Robert Haas EDB: http://www.enterprisedb.com
