Re: [HACKERS] Parallel Seq Scan

Thom Brown Wed, 11 Nov 2015 15:58:30 -0800

On 11 November 2015 at 19:51, Thom Brown <t...@linux.com> wrote:
> On 11 November 2015 at 19:26, Robert Haas <robertmh...@gmail.com> wrote:
>> On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule <pavel.steh...@gmail.com> 
>> wrote:
>>> I have a first query
>>>
>>> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
>>> differen
>>
>> Hmm, I see I was right about people finding more bugs once this was
>> committed.  That didn't take long.
>>
>> There's supposed to be code to handle this - see the
>> SharedPlanStateInstrumentation stuff in execParallel.c - but it's
>> evidently a few bricks shy of a load.
>> ExecParallelReportInstrumentation is supposed to transfer the counts
>> from each worker to the DSM:
>>
>>         ps_instrument = &instrumentation->ps_instrument[i];
>>         SpinLockAcquire(&ps_instrument->mutex);
>>         InstrAggNode(&ps_instrument->instr, planstate->instrument);
>>         SpinLockRelease(&ps_instrument->mutex);
>>
>> And ExecParallelRetrieveInstrumentation is supposed to slurp those
>> counts back into the leader's PlanState objects:
>>
>>         /* No need to acquire the spinlock here; workers have exited 
>> already. */
>>         ps_instrument = &instrumentation->ps_instrument[i];
>>         InstrAggNode(planstate->instrument, &ps_instrument->instr);
>>
>> This might be a race condition, or it might be just wrong logic.
>> Could you test what happens if you insert something like a 1-second
>> sleep in ExecParallelFinish just after the call to
>> WaitForParallelWorkersToFinish()?  If that makes the results
>> consistent, this is a race.  If it doesn't, something else is wrong:
>> then it would be useful to know whether the workers are actually
>> calling ExecParallelReportInstrumentation, and whether the leader is
>> actually calling ExecParallelRetrieveInstrumentation, and if so
>> whether they are doing it for the correct set of nodes.
>
> Hmm.. I made the change, but clearly it's not sleeping properly with
> my change (I'm expecting a total runtime in excess of 1 second):
>
> max_parallel_degree = 4:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>             QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=49578.18..49578.19 rows=1 width=0) (actual
> time=797.518..797.518 rows=1 loops=1)
>    Output: count(*)
>    Buffers: shared hit=174883 read=540
>    ->  Gather  (cost=1000.00..49546.93 rows=12500 width=0) (actual
> time=0.245..784.959 rows=55151 loops=1)
>          Output: content
>          Number of Workers: 4
>          Buffers: shared hit=174883 read=540
>          ->  Parallel Seq Scan on public.js  (cost=0.00..47296.93
> rows=12500 width=0) (actual time=0.019..6153.679 rows=94503 loops=1)
>                Output: content
>                Filter: (((((js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>                Rows Removed by Filter: 2051330
>                Buffers: shared hit=299224 read=907
>  Planning time: 0.086 ms
>  Execution time: 803.026 ms
>
>
> max_parallel_degree = 0:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>          QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=212867.43..212867.44 rows=1 width=0) (actual
> time=1278.717..1278.717 rows=1 loops=1)
>    Output: count(*)
>    Buffers: shared hit=174671 read=572
>    ->  Seq Scan on public.js  (cost=0.00..212836.18 rows=12500
> width=0) (actual time=0.018..1272.030 rows=55151 loops=1)
>          Output: content
>          Filter: (((((js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>          Rows Removed by Filter: 1197822
>          Buffers: shared hit=174671 read=572
>  Planning time: 0.064 ms
>  Execution time: 1278.741 ms
> (10 rows)
>
> Time: 1279.145 ms
>
>
> I did, however, notice that repeated runs of the query with
> max_parallel_degree = 4 yields different counts of rows removed by
> filter:
>
> Run 1: 2051330
> Run 2: 2081252
> Run 3: 2065112
> Run 4: 2022045
> Run 5: 2025384
> Run 6: 2059360
> Run 7: 2079620
> Run 8: 2058541


Here's another oddity, with max_parallel_degree = 1:

# explain (analyse, buffers, timing, verbose, costs) select count(*)
from js where content->'tags'->>'title' like '%design%';
                                                             QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=132489.34..132489.35 rows=1 width=0) (actual
time=382.987..382.987 rows=1 loops=1)
   Output: count(*)
   Buffers: shared hit=175288
   ->  Gather  (cost=1000.00..132488.34 rows=401 width=0) (actual
time=382.983..382.983 rows=0 loops=1)
         Output: content
         Number of Workers: 1
         Buffers: shared hit=175288
         ->  Parallel Seq Scan on public.js  (cost=0.00..131448.24
rows=401 width=0) (actual time=379.407..1141.437 rows=0 loops=1)
               Output: content
               Filter: (((js.content -> 'tags'::text) ->>
'title'::text) ~~ '%design%'::text)
               Rows Removed by Filter: 1724810
               Buffers: shared hit=241201
 Planning time: 0.104 ms
 Execution time: 403.045 ms
(14 rows)

Time: 403.596 ms

The actual time of the sequential scan was 1141.437ms, but the total
execution time was 403.045ms.

And successive runs with max_parallel_degree = 1 also yield a
different number of rows removed by the filter, as well as a different
number of buffers being hit:

Run: rows removed / buffers hit
1: 1738517 / 243143
2: 1729361 / 241900
3: 1737168 / 242974
4: 1734440 / 242591

Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

Reply via email to