songwdfu commented on PR #16277: URL: https://github.com/apache/pinot/pull/16277#issuecomment-3047291291
After second thought, there are several ways to do the combining of segments 1. Worker thread write every segment's result into a single IndexedTable (also in this PR). This is clearly not scalable. 2. Do N-way merge of segment results on main thread once all segments are processed. This easily guarantees logrithmic complexity, but could only start after all segments are processed. Merge is also single threaded, leaving worker threads unused. 3. Do **pair-wise merge** of segment results reusing the worker thread pool. This could start once some segments produce results, but requires extra synchronization and bookkeeping to ensure merging between same level. IMHO, 3 might be the most efficient way of doing this, if done right. To my understanding, once all segments are processed, the worker threads in executorService becomes idle and could be re-used for combining, is this correct? Then once we have two results of the same level produced, the merging (with trimming) jobs could be created and submitted to the same executorService as Runnable, which allows higher concurrency. Please correct me if I'm wrong -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
