andygrove commented on issue #1382:
URL: 
https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2675735322

   > It is weird that most of the metrics including spill size and execution 
time get 7-8x higher. I don't know why it happens but I am trying to figure out.
   
   ```
   shuffle records written: 65,254,713
   number of spills: 17,160
   spilled bytes: 16,134,291,652,608
   shuffle bytes written total (min, med, max )
   5.6 GiB (2.1 MiB, 8.7 MiB, 9.2 MiB )
   ```
   
   We found that the spilled bytes metric is incorrect - 
https://github.com/apache/datafusion-comet/issues/1437
   
   Also, in the above example, it is creating 17,160 temporary shuffle files, 
which seems very inefficient. We are going to work on optimizing this now.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to