On 31 December 2014 at 14:20, Thom Brown <t...@linux.com> wrote: > On 18 December 2014 at 16:03, Amit Kapila <amit.kapil...@gmail.com> wrote: > >> >> >> On Thu, Dec 18, 2014 at 9:22 PM, Amit Kapila <amit.kapil...@gmail.com> >> wrote: >> > >> > On Mon, Dec 8, 2014 at 10:40 AM, Amit Kapila <amit.kapil...@gmail.com> >> wrote: >> > > >> > > On Sat, Dec 6, 2014 at 5:37 PM, Stephen Frost <sfr...@snowman.net> >> wrote: >> > > > >> > > >> > > So to summarize my understanding, below are the set of things >> > > which I should work on and in the order they are listed. >> > > >> > > 1. Push down qualification >> > > 2. Performance Data >> > > 3. Improve the way to push down the information related to worker. >> > > 4. Dynamic allocation of work for workers. >> > > >> > > >> > >> > I have worked on the patch to accomplish above mentioned points >> > 1, 2 and partly 3 and would like to share the progress with community. >> >> Sorry forgot to attach updated patch in last mail, attaching it now. >> > > When attempting to recreate the plan in your example, I get an error: > > ➤ psql://thom@[local]:5488/pgbench > > # create table t1(c1 int, c2 char(500)) with (fillfactor=10); > CREATE TABLE > Time: 13.653 ms > > ➤ psql://thom@[local]:5488/pgbench > > # insert into t1 values(generate_series(1,100),'amit'); > INSERT 0 100 > Time: 4.796 ms > > ➤ psql://thom@[local]:5488/pgbench > > # explain select c1 from t1; > ERROR: could not register background process > HINT: You may need to increase max_worker_processes. > Time: 1.659 ms > > ➤ psql://thom@[local]:5488/pgbench > > # show max_worker_processes ; > max_worker_processes > ---------------------- > 8 > (1 row) > > Time: 0.199 ms > > # show parallel_seqscan_degree ; > parallel_seqscan_degree > ------------------------- > 10 > (1 row) > > > Should I really need to increase max_worker_processes to >= > parallel_seqscan_degree? If so, shouldn't there be a hint here along with > the error message pointing this out? And should the error be produced when > only a *plan* is being requested? > > Also, I noticed that where a table is partitioned, the plan isn't > parallelised: > > # explain select distinct bid from pgbench_accounts; > > > QUERY > PLAN > > ---------------------------------------------------------------------------------------- > HashAggregate (cost=1446639.00..1446643.99 rows=499 width=4) > Group Key: pgbench_accounts.bid > -> Append (cost=0.00..1321639.00 rows=50000001 width=4) > -> Seq Scan on pgbench_accounts (cost=0.00..0.00 rows=1 width=4) > -> Seq Scan on pgbench_accounts_1 (cost=0.00..4279.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_2 (cost=0.00..2640.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_3 (cost=0.00..2640.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_4 (cost=0.00..2640.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_5 (cost=0.00..2640.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_6 (cost=0.00..2640.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_7 (cost=0.00..2640.00 > rows=100000 width=4) > ... > -> Seq Scan on pgbench_accounts_498 (cost=0.00..2640.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_499 (cost=0.00..2640.00 > rows=100000 width=4) > -> Seq Scan on pgbench_accounts_500 (cost=0.00..2640.00 > rows=100000 width=4) > (504 rows) > > Is this expected? >
Another issue (FYI, pgbench2 initialised with: pgbench -i -s 100 -F 10 pgbench2): ➤ psql://thom@[local]:5488/pgbench2 # explain select distinct bid from pgbench_accounts; QUERY PLAN ------------------------------------------------------------------------------------------- HashAggregate (cost=245833.38..245834.38 rows=100 width=4) Group Key: bid -> Parallel Seq Scan on pgbench_accounts (cost=0.00..220833.38 rows=10000000 width=4) Number of Workers: 8 Number of Blocks Per Workers: 208333 (5 rows) Time: 7.476 ms ➤ psql://thom@[local]:5488/pgbench2 # explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. Time: 14897.991 ms The logs say: 2014-12-31 15:21:42 GMT [9164]: [240-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [241-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [242-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [243-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [244-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [245-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [246-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [247-1] user=,db=,client= LOG: registering background worker "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [248-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [249-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [250-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [251-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [252-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [253-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [254-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:42 GMT [9164]: [255-1] user=,db=,client= LOG: starting background worker process "backend_worker" 2014-12-31 15:21:46 GMT [9164]: [256-1] user=,db=,client= LOG: worker process: backend_worker (PID 10887) exited with exit code 1 2014-12-31 15:21:46 GMT [9164]: [257-1] user=,db=,client= LOG: unregistering background worker "backend_worker" 2014-12-31 15:21:50 GMT [9164]: [258-1] user=,db=,client= LOG: worker process: backend_worker (PID 10888) exited with exit code 1 2014-12-31 15:21:50 GMT [9164]: [259-1] user=,db=,client= LOG: unregistering background worker "backend_worker" 2014-12-31 15:21:57 GMT [9164]: [260-1] user=,db=,client= LOG: server process (PID 10869) was terminated by signal 9: Killed 2014-12-31 15:21:57 GMT [9164]: [261-1] user=,db=,client= DETAIL: Failed process was running: explain (analyse, buffers, verbose) select distinct bid from pgbench_accounts; 2014-12-31 15:21:57 GMT [9164]: [262-1] user=,db=,client= LOG: terminating any other active server processes Running it again, I get the same issue. This is with parallel_seqscan_degree set to 8, and the crash occurs with 4 and 2 too. This doesn't happen if I set the pgbench scale to 50. I suspect this is a OOM issue. My laptop has 16GB RAM, the table is around 13GB at scale 100, and I don't have swap enabled. But I'm concerned it crashes the whole instance. I also notice that requesting BUFFERS in a parallel EXPLAIN output yields no such information. Is that not possible to report? -- Thom