prashantwason commented on PR #18083:
URL: https://github.com/apache/hudi/pull/18083#issuecomment-4029984387

   @cshuo Thanks for the thoughtful follow-up. You raise a valid point about 
the existing `AppendWriteFunctionWithBIMBufferSort` and 
`AppendWriteFunctionWithDisruptorBufferSort`.
   
   Here's how continuous sorting differs from the existing async approaches:
   
   **Existing approaches (BIM/Disruptor):**
   - Batch sort (O(n log n)) at flush time, but move the sort+write to a 
background thread
   - Require double-buffering (2x memory) or a ring buffer
   - Add threading complexity (synchronization, error propagation, buffer swaps)
   - Sorting still happens as a single O(n log n) burst, just on a different 
thread
   
   **Continuous sorting (this PR):**
   - O(log n) per insert, no batch sort at all
   - Single buffer (no double-buffering overhead)
   - No threading complexity — simpler to reason about and debug
   - Incremental draining — when buffer fills, oldest sorted records are 
written immediately
   - Better for latency-sensitive workloads where predictable per-record cost 
is preferred over throughput
   
   The key trade-off is:
   - **BIM/Disruptor** optimize throughput by overlapping sort+write with 
ingestion (async), at the cost of memory and complexity
   - **Continuous sort** optimizes latency predictability by eliminating sort 
spikes entirely (no batch sort), at the cost of higher per-record overhead 
(O(log n) vs O(1) insert)
   
   These approaches are complementary — continuous sorting could potentially be 
combined with async write in the future. This PR adds it as an opt-in 
alternative via `write.buffer.sort.continuous.enabled=true`.
   
   I don't have formal benchmark numbers yet. Would it be helpful if I ran a 
comparison benchmark against the BIM approach to quantify the latency 
distribution differences?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to