On 11 November 2015 at 19:51, Thom Brown <t...@linux.com> wrote: > On 11 November 2015 at 19:26, Robert Haas <robertmh...@gmail.com> wrote: >> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule <pavel.steh...@gmail.com> >> wrote: >>> I have a first query >>> >>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are >>> differen >> >> Hmm, I see I was right about people finding more bugs once this was >> committed. That didn't take long. >> >> There's supposed to be code to handle this - see the >> SharedPlanStateInstrumentation stuff in execParallel.c - but it's >> evidently a few bricks shy of a load. >> ExecParallelReportInstrumentation is supposed to transfer the counts >> from each worker to the DSM: >> >> ps_instrument = &instrumentation->ps_instrument[i]; >> SpinLockAcquire(&ps_instrument->mutex); >> InstrAggNode(&ps_instrument->instr, planstate->instrument); >> SpinLockRelease(&ps_instrument->mutex); >> >> And ExecParallelRetrieveInstrumentation is supposed to slurp those >> counts back into the leader's PlanState objects: >> >> /* No need to acquire the spinlock here; workers have exited >> already. */ >> ps_instrument = &instrumentation->ps_instrument[i]; >> InstrAggNode(planstate->instrument, &ps_instrument->instr); >> >> This might be a race condition, or it might be just wrong logic. >> Could you test what happens if you insert something like a 1-second >> sleep in ExecParallelFinish just after the call to >> WaitForParallelWorkersToFinish()? If that makes the results >> consistent, this is a race. If it doesn't, something else is wrong: >> then it would be useful to know whether the workers are actually >> calling ExecParallelReportInstrumentation, and whether the leader is >> actually calling ExecParallelRetrieveInstrumentation, and if so >> whether they are doing it for the correct set of nodes. > > Hmm.. I made the change, but clearly it's not sleeping properly with > my change (I'm expecting a total runtime in excess of 1 second): > > max_parallel_degree = 4: > > # explain (analyse, buffers, timing, verbose, costs) select count(*) > from js where content->'tags'->0->>'term' like 'design%' or > content->'tags'->0->>'term' like 'web%'; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=49578.18..49578.19 rows=1 width=0) (actual > time=797.518..797.518 rows=1 loops=1) > Output: count(*) > Buffers: shared hit=174883 read=540 > -> Gather (cost=1000.00..49546.93 rows=12500 width=0) (actual > time=0.245..784.959 rows=55151 loops=1) > Output: content > Number of Workers: 4 > Buffers: shared hit=174883 read=540 > -> Parallel Seq Scan on public.js (cost=0.00..47296.93 > rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1) > Output: content > Filter: (((((js.content -> 'tags'::text) -> 0) ->> > 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text) > -> 0) ->> 'term'::text) ~~ 'web%'::text)) > Rows Removed by Filter: 2051330 > Buffers: shared hit=299224 read=907 > Planning time: 0.086 ms > Execution time: 803.026 ms > > > max_parallel_degree = 0: > > # explain (analyse, buffers, timing, verbose, costs) select count(*) > from js where content->'tags'->0->>'term' like 'design%' or > content->'tags'->0->>'term' like 'web%'; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=212867.43..212867.44 rows=1 width=0) (actual > time=1278.717..1278.717 rows=1 loops=1) > Output: count(*) > Buffers: shared hit=174671 read=572 > -> Seq Scan on public.js (cost=0.00..212836.18 rows=12500 > width=0) (actual time=0.018..1272.030 rows=55151 loops=1) > Output: content > Filter: (((((js.content -> 'tags'::text) -> 0) ->> > 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text) > -> 0) ->> 'term'::text) ~~ 'web%'::text)) > Rows Removed by Filter: 1197822 > Buffers: shared hit=174671 read=572 > Planning time: 0.064 ms > Execution time: 1278.741 ms > (10 rows) > > Time: 1279.145 ms > > > I did, however, notice that repeated runs of the query with > max_parallel_degree = 4 yields different counts of rows removed by > filter: > > Run 1: 2051330 > Run 2: 2081252 > Run 3: 2065112 > Run 4: 2022045 > Run 5: 2025384 > Run 6: 2059360 > Run 7: 2079620 > Run 8: 2058541
Here's another oddity, with max_parallel_degree = 1: # explain (analyse, buffers, timing, verbose, costs) select count(*) from js where content->'tags'->>'title' like '%design%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Aggregate (cost=132489.34..132489.35 rows=1 width=0) (actual time=382.987..382.987 rows=1 loops=1) Output: count(*) Buffers: shared hit=175288 -> Gather (cost=1000.00..132488.34 rows=401 width=0) (actual time=382.983..382.983 rows=0 loops=1) Output: content Number of Workers: 1 Buffers: shared hit=175288 -> Parallel Seq Scan on public.js (cost=0.00..131448.24 rows=401 width=0) (actual time=379.407..1141.437 rows=0 loops=1) Output: content Filter: (((js.content -> 'tags'::text) ->> 'title'::text) ~~ '%design%'::text) Rows Removed by Filter: 1724810 Buffers: shared hit=241201 Planning time: 0.104 ms Execution time: 403.045 ms (14 rows) Time: 403.596 ms The actual time of the sequential scan was 1141.437ms, but the total execution time was 403.045ms. And successive runs with max_parallel_degree = 1 also yield a different number of rows removed by the filter, as well as a different number of buffers being hit: Run: rows removed / buffers hit 1: 1738517 / 243143 2: 1729361 / 241900 3: 1737168 / 242974 4: 1734440 / 242591 Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers