On 10/31/24 08:16, David Rowley wrote:
On Tue, 29 Oct 2024 at 22:47, David Rowley <dgrowle...@gmail.com> wrote:
I've attached an updated patch with a few other fixes. Whilr checking
this tonight, noticed that master does not use
SubPlanState.tab_eq_funcs for anything. I resisted removing that in
this patch. Perhaps a follow-on patch can remove that. I suspect it's
not been used for a long time now, but I didn't do the archaeology
work to find out.

3974bc319 removed the SubPlanState.tab_eq_funcs field, so here's a
rebased patch.
Thanks for sharing this.
I still need to dive deeply into the code. But I have one annoying user case where the user complained about a 4x SQL server speedup in comparison to Postgres, and I guess it is a good benchmark for your code. This query is remarkable because of high grouping computation load. Of course, I can't provide the user's data, but I have prepared a synthetic test to reproduce the case (see attachment). Comparing the master with and without your patch, the first, I see is more extensive usage of memory (see complete explains in the attachment):

Current master:
---------------

Partial HashAggregate  (cost=54492.60..55588.03 rows=19917 width=889)
  (actual time=20621.028..20642.664 rows=10176 loops=9)
  Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
  Batches: 1  Memory Usage: 74513kB


Patched:
--------

Partial HashAggregate  (cost=54699.91..55799.69 rows=19996 width=889)
  (actual time=57213.280..186216.604 rows=10302738 loops=9)
  Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
  Batches: 261  Memory Usage: 527905kB  Disk Usage: 4832656kB

I wonder what causes memory consumption, but it is hard to decide on the patch's positive outcome for now.

--
regards, Andrei Lepikhov
/*
-- Master

 Finalize HashAggregate  (cost=88853.57..89949.01 rows=19917 width=889) (actual 
time=20960.694..20994.725 rows=40000 loops=1)
   Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
   Batches: 1  Memory Usage: 280081kB
   ->  Gather  (cost=55492.60..72521.63 rows=159336 width=889) (actual 
time=20640.902..20680.058 rows=91580 loops=1)
         Workers Planned: 8
         Workers Launched: 8
         ->  Partial HashAggregate  (cost=54492.60..55588.03 rows=19917 
width=889) (actual time=20621.028..20642.664 rows=10176 loops=9)
               Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
               Batches: 1  Memory Usage: 74513kB
               Worker 0:  Batches: 1  Memory Usage: 66329kB
               Worker 1:  Batches: 1  Memory Usage: 58137kB
               Worker 2:  Batches: 1  Memory Usage: 58137kB
               Worker 3:  Batches: 1  Memory Usage: 66329kB
               Worker 4:  Batches: 1  Memory Usage: 58137kB
               Worker 5:  Batches: 1  Memory Usage: 74521kB
               Worker 6:  Batches: 1  Memory Usage: 58137kB
               Worker 7:  Batches: 1  Memory Usage: 66329kB
               ->  Parallel Hash Join  (cost=4921.00..23223.33 rows=305066 
width=879) (actual time=667.304..5047.261 rows=17361113 loops=9)
                     Hash Cond: ((t1.x1 = t2.x1) AND (t1.x2 = t2.x2) AND (t1.x3 
= t2.x3) AND (t1.x4 = t2.x4))
                     ->  Parallel Seq Scan on t1  (cost=0.00..4671.00 
rows=12500 width=868) (actual time=0.009..7.437 rows=11111 loops=9)
                     ->  Parallel Hash  (cost=4671.00..4671.00 rows=12500 
width=319) (actual time=666.929..666.930 rows=11111 loops=9)
                           Buckets: 131072  Batches: 1  Memory Usage: 35776kB
                           ->  Parallel Seq Scan on t2  (cost=0.00..4671.00 
rows=12500 width=319) (actual time=560.944..562.943 rows=11111 loops=9)
 Planning Time: 1.619 ms
 JIT:
   Functions: 149
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 64.711 ms (Deform 13.706 ms), Inlining 846.944 ms, 
Optimization 2602.412 ms, Emission 1599.170 ms, Total 5113.237 ms
 Execution Time: 21030.501 ms
(29 rows)
 */

/*
-- Master + patch

 Finalize HashAggregate  (cost=89193.21..90292.99 rows=19996 width=889) (actual 
time=469100.662..469136.040 rows=40000 loops=1)
   Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
   Batches: 1  Memory Usage: 8963609kB
   ->  Gather  (cost=55699.91..72796.49 rows=159968 width=889) (actual 
time=58325.864..202369.963 rows=92724640 loops=1)
         Workers Planned: 8
         Workers Launched: 8
         ->  Partial HashAggregate  (cost=54699.91..55799.69 rows=19996 
width=889) (actual time=57213.280..186216.604 rows=10302738 loops=9)
               Group Key: t1.x1, t1.x2, t1.x3, t1.x4, t1.x5
               Batches: 261  Memory Usage: 527905kB  Disk Usage: 4832656kB
               Worker 0:  Batches: 1  Memory Usage: 247321kB
               Worker 1:  Batches: 261  Memory Usage: 527417kB  Disk Usage: 
3850624kB
               Worker 2:  Batches: 261  Memory Usage: 527905kB  Disk Usage: 
4304528kB
               Worker 3:  Batches: 325  Memory Usage: 527417kB  Disk Usage: 
4315120kB
               Worker 4:  Batches: 261  Memory Usage: 527905kB  Disk Usage: 
4403160kB
               Worker 5:  Batches: 261  Memory Usage: 527417kB  Disk Usage: 
4770368kB
               Worker 6:  Batches: 389  Memory Usage: 527417kB  Disk Usage: 
4797952kB
               Worker 7:  Batches: 229  Memory Usage: 527905kB  Disk Usage: 
4116552kB
               ->  Parallel Hash Join  (cost=4921.00..23411.07 rows=305257 
width=879) (actual time=755.402..6741.900 rows=17361113 loops=9)
                     Hash Cond: ((t2.x1 = t1.x1) AND (t2.x2 = t1.x2) AND (t2.x3 
= t1.x3) AND (t2.x4 = t1.x4))
                     ->  Parallel Seq Scan on t2  (cost=0.00..4671.00 
rows=12500 width=319) (actual time=0.016..14.229 rows=11111 loops=9)
                     ->  Parallel Hash  (cost=4671.00..4671.00 rows=12500 
width=868) (actual time=755.027..755.029 rows=11111 loops=9)
                           Buckets: 131072  Batches: 1  Memory Usage: 37536kB
                           ->  Parallel Seq Scan on t1  (cost=0.00..4671.00 
rows=12500 width=868) (actual time=0.006..1.634 rows=11111 loops=9)
 Planning Time: 5.485 ms
 JIT:
   Functions: 174
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 79.608 ms (Deform 33.590 ms), Inlining 1037.222 ms, 
Optimization 3900.018 ms, Emission 2594.133 ms, Total 7610.981 ms
 Execution Time: 470295.518 ms
 */

Attachment: synth.sql
Description: application/sql

Reply via email to