On Fri, Oct 30, 2020 at 6:26 PM Heikki Linnakangas <hlinn...@iki.fi> wrote: > Yeah, you need to access the old tuple to update its t_ctid, but > accessing it twice is still more expensive than accessing it once. Maybe > you could optimize it somewhat by keeping the buffer pinned or > something. Or push the responsibility down to the table AM, passing the > AM only the modified columns, and let the AM figure out how to deal with > the columns that were not modified, hoping that it can do something smart.
Just as a point of possible interest, back when I was working on zheap, I sort of wanted to take this in the opposite direction. In effect, a zheap tuple has system columns that don't exist for a heap tuple, and you can't do an update or delete without knowing what the values for those columns are, so zheap had to just refetch the tuple, but that sucked in comparisons with the existing heap, which didn't have to do the refetch. At the time, I thought maybe the right idea would be to extend things so that a table AM could specify an arbitrary set of system columns that needed to be bubbled up to the point where the update or delete happens, but that seemed really complicated to implement and I never tried. Here it seems like we're thinking of going the other way, and just always doing the refetch. That is of course fine for zheap comparative benchmarks: instead of making zheap faster, we just make the heap slower! Well, sort of. I didn't think about the benefits of the refetch approach when the tuples are wide. That does cast a somewhat different light on things. I suppose we could have both methods and choose the one that seems likely to be faster in particular cases, but that seems like way too much machinery. Maybe there's some way to further optimize accessing the same tuple multiple times in rapid succession to claw back some of the lost performance in the slow cases, but I don't have a specific idea. -- Robert Haas EDB: http://www.enterprisedb.com