Re: Performance issues with v18 SQL-language-function changes

Tom Lane Thu, 01 May 2025 12:51:00 -0700

I wrote:
> I still haven't figured out why this wasn't a problem in the old
> version of functions.c.


Oh, got it: I was misremembering the API contract for
tuplestore_gettupleslot.  There are two possible ways for it to
load the slot with a tuple:

* tuple points into the tuplestore's memory, and the slot's
should_free flag is false.  In this case, tuplestore_clear
leaves the slot pointing to garbage, but it doesn't matter
because we won't try to access nor free the garbage.  (This
is the case that comment was talking about.)

* tuple points to a tuple palloc'd in *the caller's memory
context* --- not the tuplestore's memory as I was thinking.
In this case the slot's should_free flag is true.

Thus the bug arises because the caller's memory context is now the
relatively short-lived per-tuple context that fmgr_sql is called in,
where in prior versions it was the SQL function's long-lived fcontext.
The slot we're putting it in is also long-lived.  So what happens
is that on the next fmgr_sql call, after the per-tuple context has
been reset and thus discarded the palloc'd tuple, the slot still
remembers that it's supposed to free the tuple and does so.  Kaboom.
But in released branches, the slot and the palloc'd tuple are in the
same context and there's no such hazard.

The quick-hack patch I posted upthread fixes this by ensuring we
clear the slot (and thus let it pfree the tuple) before context
reset of the per-tuple context wipes the tuple from underneath it.

I'm still inclined to fix this by using the patch from [1] though.
Aside from the argument that it'd be better if v18 were like later
branches here, I now realize that there's at least one false statement
in my argument at [1]:

>> Given the lack of field complaints over the many years it's been
>> like this, there doesn't seem to be a live problem.  I think the
>> explanation for that is
>> (1) those mechanisms are never used to call set-returning functions,
>> (2) therefore, the tuplestore will not be called on to hold more
>> than one result row,
>> (3) therefore, it won't get large enough that it wants to spill
>> to disk,
>> (4) therefore, its potentially dangling resowner pointer is never
>> used.
>> However, this is an uncomfortably shaky chain of assumptions.

This example has shown that tuplestore.c is capable of spilling to
disk even if it is storing only a single tuple, which means (3) is
wrong, which means the tuplestore *can* touch its resowner pointer
if the function result is wide enough.  I wonder if this means we
have a reachable bug in released branches.

                        regards, tom lane

[1] https://www.postgresql.org/message-id/2443532.1744919968%40sss.pgh.pa.us

Re: Performance issues with v18 SQL-language-function changes

Reply via email to