[GitHub] [hudi] zhangyue19921010 commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

GitBox Wed, 13 Jul 2022 15:53:39 -0700


zhangyue19921010 commented on PR #5416:
URL: https://github.com/apache/hudi/pull/5416#issuecomment-1183759671

Hi @alexeykudinkin Thanks a lot for your attention! Also Glad to have more
discussions :)

1. Disruptor not only performs well in the multi-production
multi-consumption model, but also has good performance in single production and
consumption scenarios due to the lock-free design. Based on
https://github.com/LMAX-Exchange/disruptor/wiki/Performance-Results. And it is
officially recommended to use a single-production model in the disruptor
https://lmax-exchange.github.io/disruptor/user-guide/index.html#_introduction
> One of the best ways to improve performance in concurrent systems is to
adhere to the [Single Writer
Principle](https://mechanical-sympathy.blogspot.com/2011/09/single-writer-principle.html),
this applies to the Disruptor. If you are in the situation where there will
only ever be a single thread producing events into the Disruptor, then you can
take advantage of this to gain additional performance.

2. I am not sure why it may lead to a OOM `when reader is reading too fast
and writing is not able to keep up`. Based on my limited knowledge, when
consumers consume data quickly and producers produce data relatively slowly,
consumers will continue to wait, which affects the throughput of the
application. What we want is to allow the producer's data to be delivered to
the consumer side as soon as possible, thereby improving CPU usage and
throughput. This is why we want to use the queue of Disruptor to improve the
efficiency of data flow through a lock-free design.
3. I also fully agree with your point that the disruptor cannot solve all
problems, nor can it achieve satisfactory optimization results in all
scenarios. This depends on where the performance bottleneck of the user's hudi
ingestion is. For example, like the scenario simulated in the benchmark, if the
user's schema is relatively simple, the downstream consumption is fast, and the
bottleneck is production or a lot of time is spent waiting for the data to be
ready, then the Disruptor may be able to play the greatest value in this
scenario.

As for conclusion `avoiding locks in that path will be able to reduce our
compute footprint by about ~10%`, glad this will have a positive impact, at
least it won't get worse :) Maybe we can do some tests on a variety of
scenarios specially the bottleneck is the production data speed to see how the
optimization works.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] zhangyue19921010 commented on pull request #5416: [HUDI-3963] Use Lock-Free Message Queue Disruptor Improving Hoodie Writing Efficiency

Reply via email to