I'm planning to change ExecRestrPos and the routines it calls so that an updated TupleTableSlot holding the restored-to tuple is explicitly returned.
Currently, since nothing is explicitly done to the result Slot of a plan node when we restore its position, you might think that the Slot still points at the tuple that was current just before the Restore. You'd be wrong though, at least for seqscan and indexscan plans (I haven't looked yet at the other node types that support mark/restore). The reason is that the restore operation changes the contents of a HeapTupleData struct in the scan state (rs_ctup or xs_ctup) and all that the Slot really contains is a pointer to that struct. Now this is really bad. In the first place, the Slot thinks it has a pin on the buffer containing its current tuple. After a Restore, it may have pin on the wrong buffer. It seems to be sheer chance that we've not had bugs due to this. (The underlying scan does have pin on the right buffer, but one can easily imagine sequences in which the scan could be cleared while the Slot is still assumed valid.) As of CVS tip the consequences could be even worse, because the Slot may contain some pointers to extracted fields of the tuple, and these pointers are now out of sync with the tuple that the Slot really contains. So I think that it's essential that we explicitly update the scan result Slot during ExecRestrPos. It seems to be a good idea also to make the function return the Slot. As far as I can tell, nodeMergeJoin has been depending on the assumption that the physical address of the result slot doesn't change during Restore. Which is true for all the current plan types, but since the ExecProcNode API isn't designed to assume that a node always returns the same Slot, it doesn't seem like ExecRestrPos should either. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings