On Tue, Nov 17, 2015 at 1:21 AM, Bert <bier...@gmail.com> wrote: > Hey, > > I've just pulled and compiled the new code. > I'm running a TPC-DS like test on different PostgreSQL installations, but > running (max) 12queries in parallel on a server with 12cores. > I've configured max_parallel_degree to 2, and I get messages that backend > processes crash. > I am running the same test now with 6queries in parallel, and parallel > degree to 2, and they seem to work. for now. :) > > This is the output I get in /var/log/messages > Nov 16 20:40:05 woludwha02 kernel: postgres[22918]: segfault at > 7fa3437bf104 ip 0000000000490b56 sp 00007ffdf2f083a0 error 6 in > postgres[400000+5b5000] > > Thanks for reporting the issue.
I think whats going on here is that when any of the session doesn't get any workers, we shutdown the Gather node which internally destroys the dynamic shared memory segment as well. However the same is needed as per current design for doing scan by master backend as well. So I think the fix would be to just do shutdown of workers which actually won't do anything in this scenario. I have tried to reproduce this issue with a simpler test case as below: Create two tables with large data: CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM generate_series(1, 10000000) g; CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM generate_series(1, 1000000) g; Set max_worker_processes = 2 in postgresql.conf Session-1 set max_parallel_degree=4; set parallel_tuple_cost=0; set parallel_setup_cost=0; Explain analyze select count(*) from t1 where c1 > 10000; Session-2 set max_parallel_degree=4; set parallel_tuple_cost=0; set parallel_setup_cost=0; Explain analyze select count(*) from t2 where c1 > 10000; The trick to reproduce is that the Explain statement in Session-2 needs to be executed immediately after Explain statement in Session-1. Attached patch fixes the issue for me. I think here we can go for somewhat more invasive fix as well which is if the statement didn't find any workers, then reset the dsm and also reset the execution tree (which in case of seq scan means clear the parallel scan desc and may be few more fields in scan desc) such that it performs seq scan. I am not sure how future-proof such a change would be, because resetting some of the fields in execution tree and expecting it to work in all cases might not be feasible for all nodes. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
fix_early_dsm_destroy_v1.patch
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers