Hao Zhang <zhrt1446384...@gmail.com> writes: > I found a problem when executing the plpython function: > After the plpython function returns an error, in the same session, if we > continue to execute > plpython function, the server panic will be caused.
Thanks for the report! I see the problem is that we're not expecting BeginInternalSubTransaction to fail. However, I'm not sure I like this solution, mainly because it's only covering a fraction of the problem. There are similarly unsafe usages in plperl, pltcl, and very possibly a lot of third-party PLs. I wonder if there's a way to deal with this issue without changing these API assumptions. The only readily-reachable error case in BeginInternalSubTransaction is this specific one about IsInParallelMode, which was added later than the original design and evidently with not a lot of thought or testing. The comment for it speculates about whether we could get rid of it, so I wonder if our thoughts about this ought to go in that direction. In any case, if we do proceed along the lines of catching errors from BeginInternalSubTransaction, I think your patch is a bit shy of a load because it doesn't do all the same things that other callers of PLy_spi_exception_set do. Looking at that made me wonder why the PLy_spi_exceptions lookup business was being duplicated by every caller rather than being done once in PLy_spi_exception_set. So 0001 attached is a quick refactoring patch to remove that code duplication, and then 0002 is your patch adapted to that. I also attempted to include a test case in 0002, but I'm not very satisfied with that. Your original test case seemed pretty expensive for the amount of code coverage it adds, so I tried to make it happen with debug_parallel_query instead. That does exercise the new code, but it does not exhibit the crash if run against unpatched code. That's because with this test case the error is only thrown in worker processes not the leader, so we don't end up with corrupted Python state in the leader. That result also points up that the original test case isn't very reliable for this either: you have to have parallel_leader_participation on, and you have to have the leader process at least one row, which makes it pretty timing-sensitive. On top of all that, the test would become useless if we do eventually get rid of the !IsInParallelMode restriction. So I'm kind of inclined to not bother with a test case if this gets to be committed in this form. Thoughts anyone? regards, tom lane
diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c index ff87b27de0..64ca47f218 100644 --- a/src/pl/plpython/plpy_spi.c +++ b/src/pl/plpython/plpy_spi.c @@ -28,7 +28,7 @@ static PyObject *PLy_spi_execute_query(char *query, long limit); static PyObject *PLy_spi_execute_fetch_result(SPITupleTable *tuptable, uint64 rows, int status); -static void PLy_spi_exception_set(PyObject *excclass, ErrorData *edata); +static void PLy_spi_exception_set(ErrorData *edata); /* prepare(query="select * from foo") @@ -470,8 +470,6 @@ PLy_commit(PyObject *self, PyObject *args) PG_CATCH(); { ErrorData *edata; - PLyExceptionEntry *entry; - PyObject *exc; /* Save error info */ MemoryContextSwitchTo(oldcontext); @@ -481,17 +479,8 @@ PLy_commit(PyObject *self, PyObject *args) /* was cleared at transaction end, reset pointer */ exec_ctx->scratch_ctx = NULL; - /* Look up the correct exception */ - entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode), - HASH_FIND, NULL); - - /* - * This could be a custom error code, if that's the case fallback to - * SPIError - */ - exc = entry ? entry->exc : PLy_exc_spi_error; /* Make Python raise the exception */ - PLy_spi_exception_set(exc, edata); + PLy_spi_exception_set(edata); FreeErrorData(edata); return NULL; @@ -517,8 +506,6 @@ PLy_rollback(PyObject *self, PyObject *args) PG_CATCH(); { ErrorData *edata; - PLyExceptionEntry *entry; - PyObject *exc; /* Save error info */ MemoryContextSwitchTo(oldcontext); @@ -528,17 +515,8 @@ PLy_rollback(PyObject *self, PyObject *args) /* was cleared at transaction end, reset pointer */ exec_ctx->scratch_ctx = NULL; - /* Look up the correct exception */ - entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode), - HASH_FIND, NULL); - - /* - * This could be a custom error code, if that's the case fallback to - * SPIError - */ - exc = entry ? entry->exc : PLy_exc_spi_error; /* Make Python raise the exception */ - PLy_spi_exception_set(exc, edata); + PLy_spi_exception_set(edata); FreeErrorData(edata); return NULL; @@ -594,8 +572,6 @@ void PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner) { ErrorData *edata; - PLyExceptionEntry *entry; - PyObject *exc; /* Save error info */ MemoryContextSwitchTo(oldcontext); @@ -607,17 +583,8 @@ PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner) MemoryContextSwitchTo(oldcontext); CurrentResourceOwner = oldowner; - /* Look up the correct exception */ - entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode), - HASH_FIND, NULL); - - /* - * This could be a custom error code, if that's the case fallback to - * SPIError - */ - exc = entry ? entry->exc : PLy_exc_spi_error; /* Make Python raise the exception */ - PLy_spi_exception_set(exc, edata); + PLy_spi_exception_set(edata); FreeErrorData(edata); } @@ -626,12 +593,21 @@ PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner) * internal query and error position. */ static void -PLy_spi_exception_set(PyObject *excclass, ErrorData *edata) +PLy_spi_exception_set(ErrorData *edata) { + PLyExceptionEntry *entry; + PyObject *excclass; PyObject *args = NULL; PyObject *spierror = NULL; PyObject *spidata = NULL; + /* Look up the correct exception */ + entry = hash_search(PLy_spi_exceptions, &(edata->sqlerrcode), + HASH_FIND, NULL); + + /* Fall back to SPIError if no entry */ + excclass = entry ? entry->exc : PLy_exc_spi_error; + args = Py_BuildValue("(s)", edata->message); if (!args) goto failure;
diff --git a/src/pl/plpython/expected/plpython_spi.out b/src/pl/plpython/expected/plpython_spi.out index 8853e2540d..26e1cc3c9a 100644 --- a/src/pl/plpython/expected/plpython_spi.out +++ b/src/pl/plpython/expected/plpython_spi.out @@ -464,3 +464,29 @@ SELECT plan_composite_args(); (3,label) (1 row) +-- +-- Error cases +-- +-- Check recovery from a subtransaction start failure; to make one +-- happen repeatably, try to do a SPI operation in an allegedly +-- parallel-safe function +CREATE FUNCTION falsely_marked_parallel_safe() RETURNS SETOF int AS +$$ +plpy.execute("select 2+2") +for i in range(0, 5): + yield (i) +$$ LANGUAGE plpython3u parallel safe; +SET debug_parallel_query = 1; +SELECT falsely_marked_parallel_safe(); +ERROR: error fetching next item from iterator +DETAIL: spiexceptions.InvalidTransactionState: cannot start subtransactions during a parallel operation +CONTEXT: Traceback (most recent call last): +PL/Python function "falsely_marked_parallel_safe" +parallel worker +SELECT falsely_marked_parallel_safe(); +ERROR: error fetching next item from iterator +DETAIL: spiexceptions.InvalidTransactionState: cannot start subtransactions during a parallel operation +CONTEXT: Traceback (most recent call last): +PL/Python function "falsely_marked_parallel_safe" +parallel worker +RESET debug_parallel_query; diff --git a/src/pl/plpython/plpy_cursorobject.c b/src/pl/plpython/plpy_cursorobject.c index 57e8f8ec21..8bfe05815b 100644 --- a/src/pl/plpython/plpy_cursorobject.c +++ b/src/pl/plpython/plpy_cursorobject.c @@ -98,7 +98,8 @@ PLy_cursor_query(const char *query) oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; - PLy_spi_subtransaction_begin(oldcontext, oldowner); + if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + return NULL; PG_TRY(); { @@ -196,7 +197,8 @@ PLy_cursor_plan(PyObject *ob, PyObject *args) oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; - PLy_spi_subtransaction_begin(oldcontext, oldowner); + if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + return NULL; PG_TRY(); { @@ -333,7 +335,8 @@ PLy_cursor_iternext(PyObject *self) oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; - PLy_spi_subtransaction_begin(oldcontext, oldowner); + if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + return NULL; PG_TRY(); { @@ -403,7 +406,8 @@ PLy_cursor_fetch(PyObject *self, PyObject *args) oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; - PLy_spi_subtransaction_begin(oldcontext, oldowner); + if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + return NULL; PG_TRY(); { diff --git a/src/pl/plpython/plpy_spi.c b/src/pl/plpython/plpy_spi.c index 64ca47f218..ca97f2eca6 100644 --- a/src/pl/plpython/plpy_spi.c +++ b/src/pl/plpython/plpy_spi.c @@ -77,7 +77,8 @@ PLy_spi_prepare(PyObject *self, PyObject *args) oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; - PLy_spi_subtransaction_begin(oldcontext, oldowner); + if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + return NULL; PG_TRY(); { @@ -217,7 +218,8 @@ PLy_spi_execute_plan(PyObject *ob, PyObject *list, long limit) oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; - PLy_spi_subtransaction_begin(oldcontext, oldowner); + if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + return NULL; PG_TRY(); { @@ -313,7 +315,8 @@ PLy_spi_execute_query(char *query, long limit) oldcontext = CurrentMemoryContext; oldowner = CurrentResourceOwner; - PLy_spi_subtransaction_begin(oldcontext, oldowner); + if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + return NULL; PG_TRY(); { @@ -534,7 +537,9 @@ PLy_rollback(PyObject *self, PyObject *args) * MemoryContext oldcontext = CurrentMemoryContext; * ResourceOwner oldowner = CurrentResourceOwner; * - * PLy_spi_subtransaction_begin(oldcontext, oldowner); + * if (!PLy_spi_subtransaction_begin(oldcontext, oldowner)) + * return NULL; + * * PG_TRY(); * { * <call SPI functions> @@ -551,12 +556,38 @@ PLy_rollback(PyObject *self, PyObject *args) * These utilities take care of restoring connection to the SPI manager and * setting a Python exception in case of an abort. */ -void +bool PLy_spi_subtransaction_begin(MemoryContext oldcontext, ResourceOwner oldowner) { - BeginInternalSubTransaction(NULL); - /* Want to run inside function's memory context */ - MemoryContextSwitchTo(oldcontext); + PG_TRY(); + { + /* Start subtransaction (could fail) */ + BeginInternalSubTransaction(NULL); + /* Want to run inside function's memory context */ + MemoryContextSwitchTo(oldcontext); + } + PG_CATCH(); + { + ErrorData *edata; + + /* Save error info */ + MemoryContextSwitchTo(oldcontext); + edata = CopyErrorData(); + FlushErrorState(); + + /* Ensure we restore original context and owner */ + MemoryContextSwitchTo(oldcontext); + CurrentResourceOwner = oldowner; + + /* Make Python raise the exception */ + PLy_spi_exception_set(edata); + FreeErrorData(edata); + + return false; + } + PG_END_TRY(); + + return true; } void @@ -580,6 +611,8 @@ PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner) /* Abort the inner transaction */ RollbackAndReleaseCurrentSubTransaction(); + + /* Ensure we restore original context and owner */ MemoryContextSwitchTo(oldcontext); CurrentResourceOwner = oldowner; diff --git a/src/pl/plpython/plpy_spi.h b/src/pl/plpython/plpy_spi.h index 98ccd21093..a43d803928 100644 --- a/src/pl/plpython/plpy_spi.h +++ b/src/pl/plpython/plpy_spi.h @@ -22,7 +22,7 @@ typedef struct PLyExceptionEntry } PLyExceptionEntry; /* handling of SPI operations inside subtransactions */ -extern void PLy_spi_subtransaction_begin(MemoryContext oldcontext, ResourceOwner oldowner); +extern bool PLy_spi_subtransaction_begin(MemoryContext oldcontext, ResourceOwner oldowner); extern void PLy_spi_subtransaction_commit(MemoryContext oldcontext, ResourceOwner oldowner); extern void PLy_spi_subtransaction_abort(MemoryContext oldcontext, ResourceOwner oldowner); diff --git a/src/pl/plpython/sql/plpython_spi.sql b/src/pl/plpython/sql/plpython_spi.sql index fcd113acaa..69a448cd68 100644 --- a/src/pl/plpython/sql/plpython_spi.sql +++ b/src/pl/plpython/sql/plpython_spi.sql @@ -320,3 +320,22 @@ SELECT cursor_fetch_next_empty(); SELECT cursor_plan(); SELECT cursor_plan_wrong_args(); SELECT plan_composite_args(); + +-- +-- Error cases +-- + +-- Check recovery from a subtransaction start failure; to make one +-- happen repeatably, try to do a SPI operation in an allegedly +-- parallel-safe function +CREATE FUNCTION falsely_marked_parallel_safe() RETURNS SETOF int AS +$$ +plpy.execute("select 2+2") +for i in range(0, 5): + yield (i) +$$ LANGUAGE plpython3u parallel safe; + +SET debug_parallel_query = 1; +SELECT falsely_marked_parallel_safe(); +SELECT falsely_marked_parallel_safe(); +RESET debug_parallel_query;