songwdfu commented on PR #16277:
URL: https://github.com/apache/pinot/pull/16277#issuecomment-3047291291

   After second thought, there are several ways to do the combining of segments
   
   1. Worker thread write every segment's result into a single IndexedTable 
(also in this PR). This is clearly not scalable. 
   2. Do N-way merge of segment results on main thread once all segments are 
processed. This easily guarantees logrithmic complexity, but could only start 
after all segments are processed. Merge is also single threaded, leaving worker 
threads unused. 
   3. Do **pair-wise merge** of segment results reusing the worker thread pool. 
This could start once some segments produce results, but requires extra 
synchronization and bookkeeping to ensure merging between same level.
   
   IMHO, 3 might be the most efficient way of doing this, if done right. To my 
understanding, once all segments are processed, the worker threads in 
executorService becomes idle and could be re-used for combining, is this 
correct?  
   Then once we have two results of the same level produced, the merging (with 
trimming) jobs could be created and submitted to the same executorService as 
Runnable, which allows higher concurrency. Please correct me if I'm wrong


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to