On Wed, Mar 13, 2024 at 12:01 PM Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > On 2024-Mar-13, Jelte Fennema-Nio wrote: > > Sadly I'm having a hard time reliably reproducing this race condition > > locally. So it's hard to be sure what is happening here. Attached is a > > patch with a wild guess as to what the issue might be (i.e. seeing an > > outdated "active" state and thus passing the check even though the > > query is not running yet) > > I tried leaving the original running in my laptop to see if I could > reproduce it, but got no hits ... and we didn't get any other failures > apart from the three ones already reported ... so it's not terribly high > probability. Anyway I pushed your patch now since the theory seems > plausible; let's see if we still get the issue to reproduce. If it > does, we could make the script more verbose to hunt for further clues.
I hit this on my machine. With the attached diff I can reproduce constantly (including with the most recent test patch); I think the cancel must be arriving between the bind/execute steps? Thanks, --Jacob
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index 6b7903314a..22ce7c07d9 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2073,6 +2073,9 @@ exec_bind_message(StringInfo input_message) valgrind_report_error_query(debug_query_string); debug_query_string = NULL; + + if (strstr(psrc->query_string, "pg_sleep")) + sleep(1); } /*