Mat Arye <m...@timescale.com> writes:
> I found a memory leak in plpython3u. It happens when converting the
> argument for an spi plan execution (via plpy.execute() or similar). I've
> attached a sql file to create two plpython3u functions to reproduce the
> issue (memory_leak_test.sql).

I see the leak all right, and your patch does seem to fix it, so
I believe your diagnosis that PLy_output_convert() is what's leaking.

> I believe the issue is in the call to `PLy_output_convert` inside
> `PLy_spi_execute_plan`. I've attached a patch that reduces the memory usage
> to ~70MB from ~720 MB when using test_mem. I based what I did on what
> `PLy_input_convert` does. But I am not an expert in this part of the code
> so I am not sure the patch is correct. In particular, I don't fully grasp
> the comment in `PLy_input_convert` about why the scratch is reset before
> not after the conversion cycle and what that has to do with python
> refcounts. A review by an expert would be appreciated.

The corresponding concern here would be that some pass-by-reference
conversion result Datum could come back verbatim as an output of
SPI_execute_plan, and we mustn't reset the scratch context again
before we're done with using the Datum.  Sadly, I don't think it
will work if that happens, because what we do as soon as
SPI_execute_plan finishes is to call PLy_spi_execute_fetch_result,
and that calls PLy_input_from_tuple, which is one of the things
that will zap the scratch context.  So this doesn't feel very safe
... and indeed it falls over in plpython's existing regression tests.

However, I don't really see why we need to use that scratch context.
PLy_spi_execute_plan runs a subtransaction, so we could perfectly well
use the subtransaction's CurTransactionContext.  (IOW, this wouldn't
even be an issue if PLy_spi_subtransaction_begin didn't forcibly
switch back to oldcontext.)

A couple of other points:

* PLy_spi_execute_plan already has an outer-level variable named
oldcontext, so this coding will draw complaints about a shadowed
local variable in compilers that are picky about that.  But
I don't think you need the inner oldcontext variable anyway, since you
know perfectly well that the previous context is the outer oldcontext.
Just switch back to that after using CurTransactionContext.

* Looking around at other callers of PLy_output_convert, it seems
likely that PLy_cursor_plan() has this same problem.  (I think we're
okay though with the calls from PLy_exec_function and
PLy_modify_tuple, since those could only happen right before return
from the plpython function; there's no path to iterate them enough
times to create a meaningful leak.)

                        regards, tom lane


Reply via email to