On 11 November 2015 at 19:26, Robert Haas <robertmh...@gmail.com> wrote: > On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule <pavel.steh...@gmail.com> > wrote: >> I have a first query >> >> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are >> differen > > Hmm, I see I was right about people finding more bugs once this was > committed. That didn't take long. > > There's supposed to be code to handle this - see the > SharedPlanStateInstrumentation stuff in execParallel.c - but it's > evidently a few bricks shy of a load. > ExecParallelReportInstrumentation is supposed to transfer the counts > from each worker to the DSM: > > ps_instrument = &instrumentation->ps_instrument[i]; > SpinLockAcquire(&ps_instrument->mutex); > InstrAggNode(&ps_instrument->instr, planstate->instrument); > SpinLockRelease(&ps_instrument->mutex); > > And ExecParallelRetrieveInstrumentation is supposed to slurp those > counts back into the leader's PlanState objects: > > /* No need to acquire the spinlock here; workers have exited already. > */ > ps_instrument = &instrumentation->ps_instrument[i]; > InstrAggNode(planstate->instrument, &ps_instrument->instr); > > This might be a race condition, or it might be just wrong logic. > Could you test what happens if you insert something like a 1-second > sleep in ExecParallelFinish just after the call to > WaitForParallelWorkersToFinish()? If that makes the results > consistent, this is a race. If it doesn't, something else is wrong: > then it would be useful to know whether the workers are actually > calling ExecParallelReportInstrumentation, and whether the leader is > actually calling ExecParallelRetrieveInstrumentation, and if so > whether they are doing it for the correct set of nodes.
Hmm.. I made the change, but clearly it's not sleeping properly with my change (I'm expecting a total runtime in excess of 1 second): max_parallel_degree = 4: # explain (analyse, buffers, timing, verbose, costs) select count(*) from js where content->'tags'->0->>'term' like 'design%' or content->'tags'->0->>'term' like 'web%'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=49578.18..49578.19 rows=1 width=0) (actual time=797.518..797.518 rows=1 loops=1) Output: count(*) Buffers: shared hit=174883 read=540 -> Gather (cost=1000.00..49546.93 rows=12500 width=0) (actual time=0.245..784.959 rows=55151 loops=1) Output: content Number of Workers: 4 Buffers: shared hit=174883 read=540 -> Parallel Seq Scan on public.js (cost=0.00..47296.93 rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1) Output: content Filter: (((((js.content -> 'tags'::text) -> 0) ->> 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text) -> 0) ->> 'term'::text) ~~ 'web%'::text)) Rows Removed by Filter: 2051330 Buffers: shared hit=299224 read=907 Planning time: 0.086 ms Execution time: 803.026 ms max_parallel_degree = 0: # explain (analyse, buffers, timing, verbose, costs) select count(*) from js where content->'tags'->0->>'term' like 'design%' or content->'tags'->0->>'term' like 'web%'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=212867.43..212867.44 rows=1 width=0) (actual time=1278.717..1278.717 rows=1 loops=1) Output: count(*) Buffers: shared hit=174671 read=572 -> Seq Scan on public.js (cost=0.00..212836.18 rows=12500 width=0) (actual time=0.018..1272.030 rows=55151 loops=1) Output: content Filter: (((((js.content -> 'tags'::text) -> 0) ->> 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text) -> 0) ->> 'term'::text) ~~ 'web%'::text)) Rows Removed by Filter: 1197822 Buffers: shared hit=174671 read=572 Planning time: 0.064 ms Execution time: 1278.741 ms (10 rows) Time: 1279.145 ms I did, however, notice that repeated runs of the query with max_parallel_degree = 4 yields different counts of rows removed by filter: Run 1: 2051330 Run 2: 2081252 Run 3: 2065112 Run 4: 2022045 Run 5: 2025384 Run 6: 2059360 Run 7: 2079620 Run 8: 2058541 -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers