Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

via GitHub Mon, 10 Feb 2025 19:54:21 -0800


parthchandra commented on issue #1382:
URL: 
https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2649739992


   > [@parthchandra](https://github.com/parthchandra) With comet shuffle 
disabled, the plan is almost like vanilla spark's because it replaces comet SHJ 
to spark SHJ. And thus it preserves spark's performance.
   > 
   
   Yeah, I was afraid that would be the case. Interesting that Spark gets the 
plan right but it gets messed up with Comet. Afaik, Comet itself does not do 
any of the build side planning. Maybe it should, which is what you've tried to 
do here. I'm not the expert on this, I'm afraid. @viirya any thoughts?
   
   > I have another question regarding your comment. Is it ok to use comet SMJ 
to a large dataset? Or did you just disable comet shuffle? I observed comet SMJ 
is way too slower compared to spark and that is why I am trying to use SHJ.
   
   It should be ok to use Comet SMJ. We may be spilling too soon for Comet SMJ 
causing the slower performance. @kazuyukitanimura any thoughts on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] CometHashJoin always selects BuildRight which causes potential performance regression [datafusion-comet]

Reply via email to