Re: [HACKERS] Parallel Seq Scan

Pavel Stehule Wed, 11 Nov 2015 10:12:12 -0800

2015-11-11 19:03 GMT+01:00 Thom Brown <t...@linux.com>:

> On 11 November 2015 at 17:59, Pavel Stehule <pavel.steh...@gmail.com>
> wrote:
> > Hi
> >
> > I have a first query
> >
> > I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> > differen
> >
> > postgres=# set max_parallel_degree to 4;
> > SET
> > Time: 0.717 ms
> > postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> >
> ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> > │                                                          QUERY PLAN
> > │
> >
> ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> > │ Aggregate  (cost=9282.50..9282.51 rows=1 width=0) (actual
> > time=142.541..142.541 rows=1 loops=1)                               │
> > │   ->  Gather  (cost=1000.00..9270.00 rows=5000 width=0) (actual
> > time=0.633..130.926 rows=100000 loops=1)                      │
> > │         Number of Workers: 2
> > │
> > │         ->  Parallel Seq Scan on xxx  (cost=0.00..7770.00 rows=5000
> > width=0) (actual time=0.052..411.303 rows=169631 loops=1) │
> > │               Filter: ((a % 10) = 0)
> > │
> > │               Rows Removed by Filter: 1526399
> > │
> > │ Planning time: 0.167 ms
> > │
> > │ Execution time: 144.519 ms
> > │
> >
> └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> > (8 rows)
> >
> > Time: 145.374 ms
> > postgres=# set max_parallel_degree to 1;
> > SET
> > Time: 0.706 ms
> > postgres=# EXPLAIN ANALYZE select count(*) from xxx where a % 10 = 0;
> >
> ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> > │                                                           QUERY PLAN
> > │
> >
> ╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> > │ Aggregate  (cost=14462.50..14462.51 rows=1 width=0) (actual
> > time=163.355..163.355 rows=1 loops=1)                              │
> > │   ->  Gather  (cost=1000.00..14450.00 rows=5000 width=0) (actual
> > time=0.485..152.827 rows=100000 loops=1)                      │
> > │         Number of Workers: 1
> > │
> > │         ->  Parallel Seq Scan on xxx  (cost=0.00..12950.00 rows=5000
> > width=0) (actual time=0.043..309.740 rows=145364 loops=1) │
> > │               Filter: ((a % 10) = 0)
> > │
> > │               Rows Removed by Filter: 1308394
> > │
> > │ Planning time: 0.129 ms
> > │
> > │ Execution time: 165.102 ms
> > │
> >
> └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> > (8 rows)
> >
> > Rows removed by filter: 1308394 X 1526399. Is it expected?
>
> Yeah, I noticed the same thing, but more pronounced:
>
> With set max_parallel_degree = 4:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>             QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=49575.51..49575.52 rows=1 width=0) (actual
> time=744.267..744.267 rows=1 loops=1)
>    Output: count(*)
>    Buffers: shared hit=175423
>    ->  Gather  (cost=1000.00..49544.27 rows=12496 width=0) (actual
> time=0.351..731.662 rows=55151 loops=1)
>          Output: content
>          Number of Workers: 4
>          Buffers: shared hit=175423
>          ->  Parallel Seq Scan on public.js  (cost=0.00..47294.67
> rows=12496 width=0) (actual time=0.030..5912.118 rows=96062 loops=1)
>                Output: content
>                Filter: (((((js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>                Rows Removed by Filter: 2085546
>                Buffers: shared hit=305123
>  Planning time: 0.123 ms
>  Execution time: 759.313 ms
> (14 rows)
>
>
> With set max_parallel_degree = 0:
>
> # explain (analyse, buffers, timing, verbose, costs) select count(*)
> from js where content->'tags'->0->>'term' like 'design%' or
> content->'tags'->0->>'term' like 'web%';
>
>          QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=212857.25..212857.26 rows=1 width=0) (actual
> time=1235.082..1235.082 rows=1 loops=1)
>    Output: count(*)
>    Buffers: shared hit=175243
>    ->  Seq Scan on public.js  (cost=0.00..212826.01 rows=12496
> width=0) (actual time=0.019..1228.515 rows=55151 loops=1)
>          Output: content
>          Filter: (((((js.content -> 'tags'::text) -> 0) ->>
> 'term'::text) ~~ 'design%'::text) OR ((((js.content -> 'tags'::text)
> -> 0) ->> 'term'::text) ~~ 'web%'::text))
>          Rows Removed by Filter: 1197822
>          Buffers: shared hit=175243
>  Planning time: 0.064 ms
>  Execution time: 1235.108 ms
> (10 rows)
>
> Time: 1235.517 ms
>
>
> Rows removed: 2085546 vs 1197822
> Buffers hit: 305123 vs 175243
>


yes - the another little bit unclean in EXPLAIN is number of workers. If I
understand to the behave, the query is processed by two processes if
workers in the explain is one.

Regards

Pavel


>
> Thom
>

Re: [HACKERS] Parallel Seq Scan

Reply via email to