st 3. 6. 2020 v 17:32 odesílatel Pavel Stehule <pavel.steh...@gmail.com> napsal:
> Hi > > One czech Postgres user reported performance issue related to speed > HashAggregate in nested loop. > > The speed of 9.6 > > HashAggregate (cost=27586.10..27728.66 rows=14256 width=24) > (actual time=0.003..0.049 rows=39 loops=599203) > > The speed of 10.7 > > HashAggregate (cost=27336.78..27552.78 rows=21600 width=24) > (actual time=0.011..0.156 rows=38 loops=597137) > > So it looks so HashAgg is about 3x slower - with brutal nested loop it is > a problem. > > I wrote simple benchmark and really looks so our hash aggregate is slower > and slower. > > create table foo(a int, b int, c int, d int, e int, f int); > insert into foo select random()*1000, random()*4, random()*4, random()* 2, > random()*100, random()*100 from generate_series(1,2000000); > > analyze foo; > > 9.6.7 > postgres=# explain (analyze, buffers) select i from > generate_series(1,500000) g(i) where exists (select count(*) cx from foo > group by b, c, d having count(*) = i); > > ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN > │ > > ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ > │ Function Scan on generate_series g (cost=0.00..57739020.00 rows=500 > width=4) (actual time=807.485..3364.515 rows=74 loops=1) │ > │ Filter: (SubPlan 1) > │ > │ Rows Removed by Filter: 499926 > │ > │ Buffers: shared hit=12739, temp read=856 written=855 > │ > │ SubPlan 1 > │ > │ -> HashAggregate (cost=57739.00..57739.75 rows=75 width=20) > (actual time=0.006..0.006 rows=0 loops=500000) │ > │ Group Key: foo.b, foo.c, foo.d > │ > │ Filter: (count(*) = g.i) > │ > │ Rows Removed by Filter: 75 > │ > │ Buffers: shared hit=12739 > │ > │ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000 > width=12) (actual time=0.015..139.736 rows=2000000 loops=1) │ > │ Buffers: shared hit=12739 > │ > │ Planning time: 0.276 ms > │ > │ Execution time: 3365.758 ms > │ > > └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > (14 rows) > > 10.9 > > postgres=# explain (analyze, buffers) select i from > generate_series(1,500000) g(i) where exists (select count(*) cx from foo > group by b, c, d having count(*) = i); > > ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > │ QUERY PLAN > │ > > ╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡ > │ Function Scan on generate_series g (cost=0.00..57739020.00 rows=500 > width=4) (actual time=825.468..4919.063 rows=74 loops=1) │ > │ Filter: (SubPlan 1) > │ > │ Rows Removed by Filter: 499926 > │ > │ Buffers: shared hit=12739, temp read=856 written=855 > │ > │ SubPlan 1 > │ > │ -> HashAggregate (cost=57739.00..57739.75 rows=75 width=20) > (actual time=0.009..0.009 rows=0 loops=500000) │ > │ Group Key: foo.b, foo.c, foo.d > │ > │ Filter: (count(*) = g.i) > │ > │ Rows Removed by Filter: 75 > │ > │ Buffers: shared hit=12739 > │ > │ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000 > width=12) (actual time=0.025..157.887 rows=2000000 loops=1) │ > │ Buffers: shared hit=12739 > │ > │ Planning time: 0.829 ms > │ > │ Execution time: 4920.800 ms > │ > > └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > (14 rows) > > master > > > postgres=# explain (analyze, buffers) select i from > generate_series(1,500000) g(i) where exists (select count(*) cx from foo > group by b, c, d having count(*) = i); > > ┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── > │ QUERY PLAN > > > ╞═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ > │ Function Scan on generate_series g (cost=0.00..28869973750.00 > rows=250000 width=4) (actual time=901.639..6057.943 rows=74 loops=1) > │ Filter: (SubPlan 1) > > │ Rows Removed by Filter: 499926 > > │ Buffers: shared hit=12739, temp read=855 written=855 > > │ SubPlan 1 > > │ -> HashAggregate (cost=57739.00..57739.94 rows=1 width=20) (actual > time=0.012..0.012 rows=0 loops=500000) > │ Group Key: foo.b, foo.c, foo.d > > │ Filter: (count(*) = g.i) > > │ Peak Memory Usage: 37 kB > > │ Rows Removed by Filter: 75 > > │ Buffers: shared hit=12739 > > │ -> Seq Scan on foo (cost=0.00..32739.00 rows=2000000 > width=12) (actual time=0.017..262.497 rows=2000000 loops=1) > │ Buffers: shared hit=12739 > > │ Planning Time: 0.275 ms > > │ Buffers: shared hit=1 > > │ Execution Time: 6059.266 ms > > > └───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── > (16 rows) > > Regards > I tried to run same query on half data size, and the performance is almost same. Probably the performance issue can be related to initialization or finalization of aggregation. Pavel > > Pavel >