HappenLee opened a new issue #4394: URL: https://github.com/apache/incubator-doris/issues/4394
## Motivation At present, Doris support 3 type join: **shuffle join**, **broadcast join**, **colocate join**. Except colocate join,another join will lead to a lot of network consumption. For example, there a SQL A join B, the cost of network. * **broadcast join**: if table A is divided into three parts,the net work cost is ``` 3B``` * **shuffle join**: the network cost is ```A + B```. These network consumption not only leads to slow query, but also leads to extra memory consumption during join. Each Doris table have disrtribute info, if the join expr hit the distribute info, we should use the distribute info to reduce the network consumption. ## What is bucket shuffle join  just like Hive's bucket map join, the picture show how it work. if there a SQL A join B, and the join expr hit the distribute info of A. Bucket shuffle join only need distribute table B, sent the data to proper table A part. So the network cost is always ```B```. So compared with the original join, obviously bucket shuffle join lead to less network cost: B < min(3B, A + B) ### It can bring us the following benefits: 1. First, Bucket Shuffle Join reduce the network cost and lead to a better performance for some join. Especially when the bucket is cropped. 2. It does not strongly rely on the mechanism of collocate, so it is transparent to users. There is no mandatory requirement for data distribution, which will not lead to data skew. 3. It can provide more query optimization space for join reorder. ## POC of Bucket Shuffle Join Now I've implemented a simple Bucket Shuffle join in Doris and test the performance of it. Now, we chose tpcds query 57. The query have 6 join operation, and 4 of them can hit Bucket shuffle join. | | Origin Doris | Bucket shuffle join | | :----: | :----: | :----: | | Time Cost | 27.7s | 16.4s | It seems to work as well as we expected. I'll do more experiments to verify its performance in the future ## Implementation 1. First, we should add a partition type in thrift type 2. FE able to plan and sense queries that can be used bucket shuffle join. send data distribution info to BE 3. BE use the proper hash function to send proper data to proper instance of BE. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org