andygrove commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2675735322
> It is weird that most of the metrics including spill size and execution time get 7-8x higher. I don't know why it happens but I am trying to figure out. ``` shuffle records written: 65,254,713 number of spills: 17,160 spilled bytes: 16,134,291,652,608 shuffle bytes written total (min, med, max ) 5.6 GiB (2.1 MiB, 8.7 MiB, 9.2 MiB ) ``` We found that the spilled bytes metric is incorrect - https://github.com/apache/datafusion-comet/issues/1437 Also, in the above example, it is creating 17,160 temporary shuffle files, which seems very inefficient. We are going to work on optimizing this now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org