HappenLee opened a new issue #5467: URL: https://github.com/apache/incubator-doris/issues/5467
## Motivation Currently, Doris supports merging and sorting the data sent by the underlying sorted nodes on the exchange node to ensure that the final data is in order. This is a multi-channel merge process, but when the amount of underlying data is too large and there are too many sending nodes, the single thread merge sort will become the bottleneck of query performance. So,We Should Support Parallel Merge in Exchange Node to speed up query. ## Implementation This picture show the design of parallel merge  There we chose parallel merge, we should make thread execute more parallel to minimized the computation of top merger TopMerger: have child merger to supplier data ChildMerger: have sender queue to supplier data, each merger start a thread to merge data firstly SenderQueue: the data from other node Before parallel merge, if we have 81 sender queue, data is 1000, the computation is 1000 * log(81) After parallel merge, the computation is MAX(1000 * log(2), 500 * log(41)) #### How many child merger and thread Now we only support max 3 merge child, each child use 1 thread, because: we have N _sender_queue, M merge child. the best way is log(N / M) = M * log(M) So if N = 8, M = 2 N = 81, M = 3 N = 1024, M = 4 normally the N is lower than 1024, so we chose 8 <= N < 81, M = 2 N >= 81, M = 3 ## Test 3000kw data test `order by` | Merge Sort | Parallel Merge Sort| | :-----: | :----: | | 3.3s | 1.49s | ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org For additional commands, e-mail: commits-h...@doris.apache.org