hayman42 commented on issue #1382:
URL: 
https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2650539273

   @kazuyukitanimura I am not sure but I think the slowness comes from 
CometExchange that is executed after the join with BuildLeft
   
   this is for original Comet
   
   ```diff
   CometExchange
   
   shuffle records written: 65,254,713
   number of spills: 2,619
   shuffle write time total (min, med, max )
   29.7 s (2 ms, 43 ms, 138 ms )
   number of input batches: 100,000
   records read: 65,254,713
   memory pool time total (min, med, max )
   1.2 m (25 ms, 113 ms, 183 ms )
   local bytes read total (min, med, max )
   513.1 MiB (4.3 MiB, 7.7 MiB, 9.5 MiB )
   fetch wait time total (min, med, max )
   0 ms (0 ms, 0 ms, 0 ms )
   remote bytes read total (min, med, max )
   3.4 GiB (37.6 MiB, 52.2 MiB, 54.1 MiB )
   repartition time total (min, med, max )
   50.8 s (21 ms, 53 ms, 315 ms )
   decoding and decompression time total (min, med, max )
   3.3 m (1.6 s, 2.4 s, 8.6 s )
   local blocks read: 171,154
   spilled bytes: 2,250,148,478,976
   remote blocks read: 1,160,828
   data size total (min, med, max )
   4.4 GiB (4.5 MiB, 6.8 MiB, 7.2 MiB )
   native shuffle writer time total (min, med, max )
   4.8 m (100 ms, 341 ms, 1.5 s )
   number of partitions: 2,000
   encoding and compression time total (min, med, max )
   1.9 m (34 ms, 102 ms, 744 ms )
   remote reqs duration total (min, med, max )
   1.2 m (339 ms, 639 ms, 2.2 s )
   shuffle bytes written total (min, med, max )
   3.9 GiB (2.6 MiB, 6.2 MiB, 6.8 MiB )
   ```
   
   and here is the metric with my change
   
   ```
   CometExchange
   
   shuffle records written: 65,254,713
   number of spills: 17,160
   shuffle write time total (min, med, max )
   3.0 m (2 ms, 243 ms, 878 ms )
   number of input batches: 11,485,874
   records read: 65,254,713
   memory pool time total (min, med, max )
   8.9 m (36 ms, 782 ms, 1.7 s )
   local bytes read total (min, med, max )
   748.1 MiB (1611.3 KiB, 7.9 MiB, 9.8 MiB )
   fetch wait time total (min, med, max )
   0 ms (0 ms, 0 ms, 0 ms )
   remote bytes read total (min, med, max )
   4.8 GiB (12.7 MiB, 52.0 MiB, 55.9 MiB )
   repartition time total (min, med, max )
   4.2 m (62 ms, 294 ms, 2.0 s )
   decoding and decompression time total (min, med, max )
   18.8 m (7.4 s, 9.9 s, 36.4 s )
   local blocks read: 174,515
   spilled bytes: 16,134,291,652,608
   remote blocks read: 1,157,467
   data size total (min, med, max )
   5.9 GiB (6.0 MiB, 9.1 MiB, 9.6 MiB )
   native shuffle writer time total (min, med, max )
   22.1 m (152 ms, 1.7 s, 6.7 s )
   number of partitions: 2,000
   encoding and compression time total (min, med, max )
   3.7 m (21 ms, 244 ms, 2.0 s )
   remote reqs duration total (min, med, max )
   57.0 s (71 ms, 337 ms, 1.6 s )
   shuffle bytes written total (min, med, max )
   5.6 GiB (2.1 MiB, 8.7 MiB, 9.2 MiB )
   ```
   
   It is weird that most of the metrics including spill size and execution time 
get 7-8x higher. I don't know why it happens but I am trying to figure out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to