hayman42 opened a new issue, #1382: URL: https://github.com/apache/datafusion-comet/issues/1382
### Describe the bug First of all, thank you guys for such a great project. I am currently doing some research to see if our team can make use of datafusion comet to our workload. As mentioned in [hashjoin](https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/joins/hash_join.rs#L176), it is important to keep build side table as small as possible. I am not sure if it is intended, but anyway current comet's implementation always chooses BuildRight unless it is impossible to build right. This causes performance regression for query like tpch q9. Additionally, I tried to make some modifications to RewriteJoin so that build side selection works based on size of each table, but then other bugs happen. Below are the metrics from CometHashJoin. Before BuildRight is selected even if right table is much larger <img width="900" alt="Image" src="https://github.com/user-attachments/assets/c922970f-7185-4b6c-9022-b3e94518e11a" /> After modification (I refered Gluten's source code https://github.com/hayman42/datafusion-comet/blob/main/spark/src/main/scala/org/apache/comet/rules/RewriteJoin.scala) BuildLeft is selected and as a result CometHashJoin has become faster <img width="899" alt="Image" src="https://github.com/user-attachments/assets/07f23217-e1cc-4b88-bc27-3da2915e0293" /> but afterwards I could find other bugs which make the job even slower. I just made changes to it without deep understanding of this project so I think that is the reason why. ### Steps to reproduce Run tpch SF200 q9 with following configs ``` --master yarn\ --deploy-mode cluster\ --driver-memory 4G\ --executor-memory 10G\ --executor-cores 4\ --num-executors 8\ --conf spark.executor.memoryOverhead=0g \ --conf spark.sql.shuffle.partitions=2000 \ --conf spark.eventLog.enabled=true \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=7g \ --conf spark.sql.extensions=org.apache.comet.CometSparkSessionExtensions \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.comet.exec.shuffle.mode=auto \ --conf spark.comet.exec.shuffle.compression.codec=lz4 \ --conf spark.comet.exec.shuffle.fallbackToColumnar=true \ --conf spark.comet.exec.sort.enabled=true \ --conf spark.comet.exec.replaceSortMergeJoin=true ``` Most of our workload is TB~PB scale, so I used multiple executors to test scalability. I added `--conf spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold=64MB` for vanilla spark test to always enable SHJ. ### Expected behavior Ideally it should behave like vanilla spark's SHJ. Query with Spark SHJ is almost 8x faster in the setting above ### Additional context Below are the details for each setting <details> <summary>Comet (BuildRight - BuildRight - BuildRight)</summary>  </details> <details> <summary>Spark (BuildLeft - BuildRight - BuildLeft)</summary>  </details> <details> <summary>Comet with custom RewriteRule (BuildLeft - BuildRight - BuildRight(left table has become larger for unknown reason))</summary>  </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org