st 3. 6. 2020 v 17:32 odesílatel Pavel Stehule <pavel.steh...@gmail.com>
napsal:

> Hi
>
> One czech Postgres user reported performance issue related to speed
> HashAggregate in nested loop.
>
> The speed of 9.6
>
> HashAggregate  (cost=27586.10..27728.66 rows=14256 width=24)
> (actual time=0.003..0.049 rows=39 loops=599203)
>
> The speed of 10.7
>
> HashAggregate  (cost=27336.78..27552.78 rows=21600 width=24)
> (actual time=0.011..0.156 rows=38 loops=597137)
>
> So it looks so HashAgg is about 3x slower - with brutal nested loop it is
> a problem.
>
> I wrote simple benchmark and really looks so our hash aggregate is slower
> and slower.
>
> create table foo(a int, b int, c int, d int, e int, f int);
> insert into foo select random()*1000, random()*4, random()*4, random()* 2,
> random()*100, random()*100 from generate_series(1,2000000);
>
> analyze foo;
>
> 9.6.7
> postgres=# explain (analyze, buffers) select i from
> generate_series(1,500000) g(i) where exists (select count(*) cx from foo
> group by b, c, d having count(*) = i);
>
> ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │                                                          QUERY PLAN
>                                                       │
>
> ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> │ Function Scan on generate_series g  (cost=0.00..57739020.00 rows=500
> width=4) (actual time=807.485..3364.515 rows=74 loops=1) │
> │   Filter: (SubPlan 1)
>                                                       │
> │   Rows Removed by Filter: 499926
>                                                      │
> │   Buffers: shared hit=12739, temp read=856 written=855
>                                                      │
> │   SubPlan 1
>                                                       │
> │     ->  HashAggregate  (cost=57739.00..57739.75 rows=75 width=20)
> (actual time=0.006..0.006 rows=0 loops=500000)              │
> │           Group Key: foo.b, foo.c, foo.d
>                                                      │
> │           Filter: (count(*) = g.i)
>                                                      │
> │           Rows Removed by Filter: 75
>                                                      │
> │           Buffers: shared hit=12739
>                                                       │
> │           ->  Seq Scan on foo  (cost=0.00..32739.00 rows=2000000
> width=12) (actual time=0.015..139.736 rows=2000000 loops=1)  │
> │                 Buffers: shared hit=12739
>                                                       │
> │ Planning time: 0.276 ms
>                                                       │
> │ Execution time: 3365.758 ms
>                                                       │
>
> └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (14 rows)
>
> 10.9
>
> postgres=# explain (analyze, buffers) select i from
> generate_series(1,500000) g(i) where exists (select count(*) cx from foo
> group by b, c, d having count(*) = i);
>
> ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
> │                                                          QUERY PLAN
>                                                       │
>
> ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
> │ Function Scan on generate_series g  (cost=0.00..57739020.00 rows=500
> width=4) (actual time=825.468..4919.063 rows=74 loops=1) │
> │   Filter: (SubPlan 1)
>                                                       │
> │   Rows Removed by Filter: 499926
>                                                      │
> │   Buffers: shared hit=12739, temp read=856 written=855
>                                                      │
> │   SubPlan 1
>                                                       │
> │     ->  HashAggregate  (cost=57739.00..57739.75 rows=75 width=20)
> (actual time=0.009..0.009 rows=0 loops=500000)              │
> │           Group Key: foo.b, foo.c, foo.d
>                                                      │
> │           Filter: (count(*) = g.i)
>                                                      │
> │           Rows Removed by Filter: 75
>                                                      │
> │           Buffers: shared hit=12739
>                                                       │
> │           ->  Seq Scan on foo  (cost=0.00..32739.00 rows=2000000
> width=12) (actual time=0.025..157.887 rows=2000000 loops=1)  │
> │                 Buffers: shared hit=12739
>                                                       │
> │ Planning time: 0.829 ms
>                                                       │
> │ Execution time: 4920.800 ms
>                                                       │
>
> └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
> (14 rows)
>
> master
>
>
> postgres=# explain (analyze, buffers) select i from
> generate_series(1,500000) g(i) where exists (select count(*) cx from foo
> group by b, c, d having count(*) = i);
>
> ┌─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
> │                                                             QUERY PLAN
>
>
> ╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
> │ Function Scan on generate_series g  (cost=0.00..28869973750.00
> rows=250000 width=4) (actual time=901.639..6057.943 rows=74 loops=1)
> │   Filter: (SubPlan 1)
>
> │   Rows Removed by Filter: 499926
>
> │   Buffers: shared hit=12739, temp read=855 written=855
>
> │   SubPlan 1
>
> │     ->  HashAggregate  (cost=57739.00..57739.94 rows=1 width=20) (actual
> time=0.012..0.012 rows=0 loops=500000)
> │           Group Key: foo.b, foo.c, foo.d
>
> │           Filter: (count(*) = g.i)
>
> │           Peak Memory Usage: 37 kB
>
> │           Rows Removed by Filter: 75
>
> │           Buffers: shared hit=12739
>
> │           ->  Seq Scan on foo  (cost=0.00..32739.00 rows=2000000
> width=12) (actual time=0.017..262.497 rows=2000000 loops=1)
> │                 Buffers: shared hit=12739
>
> │ Planning Time: 0.275 ms
>
> │   Buffers: shared hit=1
>
> │ Execution Time: 6059.266 ms
>
>
> └─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
> (16 rows)
>
> Regards
>

I tried to run same query on half data size, and the performance is almost
same. Probably the performance issue can be related to initialization or
finalization of aggregation.

Pavel


>
> Pavel
>

Reply via email to