zhangyue19921010 commented on PR #17827:
URL: https://github.com/apache/hudi/pull/17827#issuecomment-4167135419

   [Technical Analysis of the Loser Tree Algorithm for Multi-way 
Merge.pdf](https://github.com/user-attachments/files/26394582/Technical.Analysis.of.the.Loser.Tree.Algorithm.for.Multi-way.Merge.pdf)
   First of all, sorry for the late reply. The above is the algorithm details 
regarding the loser tree-based multi-way merge sort. 
   
   It should be noted here that during practice, we found that we need to 
perform local deduplication on each batch of data during the writing process. 
   
   This has two advantages: 
   1. first, there is no need to perform the heavy clone operation on each 
piece of data during multi-way merging; 
   2. second, deduplication during writing eliminates the need for each reader 
to repeatedly bear the pressure of deduplication.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to