parthchandra commented on issue #1382: URL: https://github.com/apache/datafusion-comet/issues/1382#issuecomment-2649739992
> [@parthchandra](https://github.com/parthchandra) With comet shuffle disabled, the plan is almost like vanilla spark's because it replaces comet SHJ to spark SHJ. And thus it preserves spark's performance. > Yeah, I was afraid that would be the case. Interesting that Spark gets the plan right but it gets messed up with Comet. Afaik, Comet itself does not do any of the build side planning. Maybe it should, which is what you've tried to do here. I'm not the expert on this, I'm afraid. @viirya any thoughts? > I have another question regarding your comment. Is it ok to use comet SMJ to a large dataset? Or did you just disable comet shuffle? I observed comet SMJ is way too slower compared to spark and that is why I am trying to use SHJ. It should be ok to use Comet SMJ. We may be spilling too soon for Comet SMJ causing the slower performance. @kazuyukitanimura any thoughts on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org