On 2018-02-12 15:43:49 -0500, Peter Eisentraut wrote: > On 2/6/18 12:06, Andres Freund wrote: > > On 2018-02-06 12:01:08 -0500, Peter Eisentraut wrote: > >> On 2/1/18 20:35, Andres Freund wrote: > >>> On February 1, 2018 11:13:06 PM GMT+01:00, Peter Eisentraut > >>> <peter.eisentr...@2ndquadrant.com> wrote: > >>>> Here is a patch to implement that idea. Do you have a way to test it > >>>> repeatedly, or do you just randomly cancel queries? > >>> > >>> For me cancelling the long running parallel queries I tried reliably > >>> triggers the issue. I encountered it while cancelling tpch q1 during JIT > >>> work. > >> > >> Why does canceling a query result in elog(FATAL)? It should just be > >> elog(ERROR), which wouldn't trigger this issue. > > > > The workers are shut down. > > I have used the setup mentioned in > <https://www.postgresql.org/message-id/6a909374-2602-7136-8c70-397330a418f3%402ndquadrant.com> > to reproduce this, without success. I have tried statement_timeout and > manual cancels. Any other ideas? > > I don't doubt that the issue exists, but it would be nice to be able to > reproduce it.
With your example I can reliably trigger the issue if I shut down the server while the query is running: ^C2018-02-14 10:54:06.786 PST [22261][] LOG: received fast shutdown request 2018-02-14 10:54:06.786 PST [22261][] LOG: aborting any active transactions 2018-02-14 10:54:06.786 PST [22275][4/3] FATAL: terminating connection due to administrator command 2018-02-14 10:54:06.786 PST [22275][4/3] STATEMENT: select from t1 where a = 55; 2018-02-14 10:54:06.786 PST [22274][5/3] FATAL: terminating connection due to administrator command 2018-02-14 10:54:06.786 PST [22274][5/3] STATEMENT: select from t1 where a = 55; 2018-02-14 10:54:06.786 PST [22271][3/2] FATAL: terminating connection due to administrator command 2018-02-14 10:54:06.786 PST [22271][3/2] STATEMENT: select from t1 where a = 55; 2018-02-14 10:54:06.787 PST [22261][] LOG: background worker "logical replication launcher" (PID 22268) exited with exit code 1 2018-02-14 10:54:06.787 PST [22261][] LOG: background worker "parallel worker" (PID 22274) exited with exit code 1 2018-02-14 10:54:06.787 PST [22261][] LOG: background worker "parallel worker" (PID 22275) exited with exit code 1 2018-02-14 10:54:06.788 PST [22261][] LOG: server process (PID 22271) was terminated by signal 11: Segmentation fault 2018-02-14 10:54:06.788 PST [22261][] DETAIL: Failed process was running: select from t1 where a = 55; 2018-02-14 10:54:06.788 PST [22261][] LOG: terminating any other active server processes 2018-02-14 10:54:06.789 PST [22285][] FATAL: the database system is shutting down 2018-02-14 10:54:06.789 PST [22261][] LOG: abnormal database system shutdown 2018-02-14 10:54:06.790 PST [22261][] LOG: database system is shut down but only if I don't use EXPLAIN ANALYZE. Not quite sure what that is about. Your patch appears to fix the issue. Greetings, Andres Freund