HappenLee opened a new issue #5467:
URL: https://github.com/apache/incubator-doris/issues/5467


   ## Motivation
   Currently, Doris supports merging and sorting the data sent by the 
underlying sorted nodes on the exchange node to ensure that the final data is 
in order. This is a multi-channel merge process, but when the amount of 
underlying data is too large and there are too many sending nodes, the single 
thread merge sort will become the bottleneck of query performance.
   
   So,We Should Support Parallel Merge in Exchange Node to speed up query. 
   ## Implementation
   
   This picture show the design of parallel merge
   
   
![image.png](https://upload-images.jianshu.io/upload_images/8552201-d8337a1ef0026560.png?imageMogr2/auto-orient/strip%7CimageView2/2/w/1240)
   
   
    There we chose parallel merge, we should make thread execute more parallel 
to minimized the computation of top merger
    TopMerger: have child merger to supplier data
    ChildMerger: have sender queue to supplier data, each merger start a thread 
to merge data firstly
    SenderQueue: the data from other node
    Before parallel merge, if we have 81 sender queue, data is 1000, the 
computation is 1000 * log(81)
    After parallel merge, the computation is MAX(1000 * log(2), 500 * log(41))
   
   
   #### How many child merger and thread
    Now we only support max 3 merge child, each child use 1 thread, because:
    we have N _sender_queue, M merge child. the best way is log(N / M) = M * 
log(M)
    So if N = 8, M = 2
         N = 81, M = 3
            N = 1024, M = 4
    normally the N is lower than 1024, so we chose 8 <= N < 81, M = 2
                                                                                
        N >= 81, M = 3
   
   ## Test
   
   3000kw data test `order by` 
   
   | Merge Sort | Parallel Merge Sort|
   | :-----: | :----: |
   | 3.3s | 1.49s |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to