Sounds good. Thanks for sharing your findings.

On Sat, May 25, 2024 at 2:24 AM zhangjian <1361320...@qq.com> wrote:

> Hello everyone, I conducted a performance comparison test between sync and
> asynchronous router, and the test results showed that in single ns or multi
> ns scenarios, Asynchronous router in terms of throughput The utilization of
> CPU and thread, as well as the average processing time of client requests,
> are better than those of sync router, especially when downstream ns have
> performance bottlenecks, The performance of the async router is far greater
> than that of the sync router; And in terms of isolation, Asynchronous
> router is also better than sync router.
> Detailed testing PDF: https://issues.apache.org/jira/browse/HDFS-17531
>  Comparison of Async router & sync router performance.pdf
>
> 2024年5月24日 14:13,Yuanbo Liu <liuyuanb...@gmail.com> 写道:
>
> good job!
>
> On Fri, May 24, 2024 at 1:57 AM zhangjian <1361320...@qq.com> wrote:
>
>> Hello everyone, currently, I have tested the performance of async and
>> sync router for a downstream ns:
>> 1. The throughput, CPU, and thread performance of the async router are
>> better than those of the sync router, and its memory performance is within
>> an acceptable range compared to the synchronous router.
>> 2. Asynchronous router can apply pressure downstream to better utilize
>> the performance of downstream ns, and can almost fill the call queue of
>> downstream ns.
>>
>> Due to the large size of the test result pdf, it cannot be sent via email,
>>
>> please see: https://issues.apache.org/jira/browse/HDFS-17531
>>
>> > 2024年5月23日 17:03,Xiaoqiao He <hexiaoq...@apache.org> 写道:
>> >
>> > Great. Thanks for your addendum information.
>> >
>> > cc @Ayush Saxena <ayush...@gmail.com> @inigo...@apache.org
>> > <inigo...@apache.org> Any more feedback for this proposal?
>> >
>> > IMO The feature of asynchronous router RPC is a helpful improvement.
>> For my
>> > internal practice, it will improve the throughput of requests forward
>> > significantly
>> > and is very valuable to push it forward.
>> > Thanks again and good luck!
>> >
>> > Best Regards,
>> > - He Xiaoqiao
>> >
>> > On Wed, May 22, 2024 at 9:59 AM zhangjian <1361320...@qq.com> wrote:
>> >
>> >> Hi, Sangjin Lee, thank you for your attention. I will use my free time
>> to
>> >> do a performance comparison recently.
>> >>
>> >>> 2024年5月22日 03:42,Sangjin Lee <sj...@apache.org> 写道:
>> >>>
>> >>> Thanks for the great proposal, Zhangjian. On point #3, I suspect it
>> >> should
>> >>> be fairly straightforward to create a small isolated synthetic test to
>> >>> prove (or disprove) the benefits of this approach. By driving a
>> >> controlled
>> >>> amount of requests per second, you could see latency, memory, CPU,
>> etc.
>> >>> Ideally, it should show meaningful improvements without much
>> degradation
>> >> in
>> >>> other metrics. Would you be able to spend some time doing that?
>> >>>
>> >>> Thanks,
>> >>> Sangjin
>> >>>
>> >>> On Tue, May 21, 2024 at 5:13 AM zhangjian <1361320...@qq.com.invalid>
>> >> wrote:
>> >>>
>> >>>> Hi, xiaoqiao he, thank you for your reply.
>> >>>>
>> >>>> 1.Currently, the server and client protocols within router can be
>> >>>> implemented by extends existing protocols and adding asynchronous
>> >>>> functionality, so it will not affect existing synchronization
>> protocols.
>> >>>> RouterClientNamenodeProtocolServerSideTranslatorPB
>> >>>> RouterClientProtocolTranslatorPB
>> >>>> RouterGetUserMappingsProtocolServerSideTranslatorPB
>> >>>> RouterGetUserMappingsProtocolTranslatorPB
>> >>>> RouterNamenodeProtocolServerSideTranslatorPB
>> >>>> RouterNamenodeProtocolTranslatorPB
>> >>>> RouterRefreshUserMappingsProtocolServerSideTranslatorPB
>> >>>> RouterRefreshUserMappingsProtocolTranslatorPB
>> >>>>
>> >>>> The following issues have implemented asynchronous callbacks for
>> >>>> Rpc.server, but I have not found any other modules to use related
>> >> functions
>> >>>> Server HADOOP-11552 HADOOP-17046
>> >>>> In the implementation of asynchronous Rpc.client, this issue is
>> directly
>> >>>> used
>> >>>> Client HADOOP-13226
>> >>>> Therefore, I believe that asynchronous routers are safe for modifying
>> >> the
>> >>>> RPC protocol, RPC server, and client
>> >>>>
>> >>>> 2. Forwarding requests to multiple downstream ns, the synchronous
>> router
>> >>>> handler adds requests from multiple downstream ns to the thread pool
>> >>>> (RouterRpcClient.executorService), and then waits for responses from
>> all
>> >>>> downstream ns before returning. Since threads in the thread pool also
>> >>>> process rpc requests synchronously, similar to a handler, the number
>> of
>> >>>> threads in the thread pool directly affects the performance of
>> >>>> invoiceConcurrent, which in turn affects the performance of the
>> handler.
>> >>>> In asynchronous router implementation, the handler calls
>> >> invoiceConcurrent
>> >>>> to simply convert a request into multiple requests and add them to
>> the
>> >> asyn
>> >>>> handler thread pool, which can then process the next request in the
>> call
>> >>>> queue; When a connection thread of a downstream ns receives a
>> response,
>> >> it
>> >>>> will hand it over to the async response for processing. The async
>> >> response
>> >>>> thread will determine whether it has received all responses from the
>> >>>> downstream ns. If it does, it will continue to process the response.
>> >>>> Otherwise, the async response thread will process the next response.
>> The
>> >>>> asynchronous router uses CompletableFuture.allOf() to implement
>> >>>> asynchronous invoiceConcurrent, and the handler, async handler, async
>> >>>> response, and connection thread still does not need to wait
>> >> synchronously.
>> >>>> In addition, synchronous routers not only have drawbacks in multi ns
>> >>>> environments, but also in single downstream ns situations, it is
>> often
>> >>>> difficult to decide how many handlers to set for the router, setting
>> it
>> >> too
>> >>>> much will waste thread resources, and setting it too small will not
>> be
>> >> able
>> >>>> to give pressure to downstream ns; Asynchronous routers can push
>> >> requests
>> >>>> to downstream ns without considering how to set handlers.
>> Asynchronous
>> >>>> routers can also better connect to more downstream storage services
>> that
>> >>>> support the HDFS protocol, with better scalability.
>> >>>>
>> >>>> 3.Since I have not yet deployed asynchronous routers to our own
>> cluster,
>> >>>> there is no performance comparison. However, theoretically, I believe
>> >> that
>> >>>> asynchronous routers will occupy more memory than synchronous
>> routers.
>> >>>> However, I do not believe that it will occupy a lot, especially
>> since we
>> >>>> can control the maximum number of requests entering the router, as
>> >>>> CompletableFuture is stable and widely used; In other aspects, it
>> >> should be
>> >>>> far superior to synchronous routers, especially in downstream
>> scenarios
>> >>>> with more ns.If anyone is interested, you can also help to make a
>> >>>> performance comparison
>> >>>>
>> >>>>> 2024年5月21日 11:39,Xiaoqiao He <hexiaoq...@apache.org> 写道:
>> >>>>>
>> >>>>> Thanks for this great proposal!
>> >>>>>
>> >>>>> Some questions after reviewing the design doc (sorry didn't review
>> PR
>> >>>>> carefully which is too large.)
>> >>>>> 1. This solution will involve RPC framework update, will it affect
>> >> other
>> >>>>> modules and how to
>> >>>>> keep other modules off these changes.
>> >>>>> 2. Some RPC requests should be forward concurrently to all
>> downstream
>> >> NS,
>> >>>>> will it cover
>> >>>>> this case in this solution.
>> >>>>> 3. Considering there is one init-version implementation, did you
>> >> collect
>> >>>>> some benchmark vs
>> >>>>> the current synchronous model of DFSRouter?
>> >>>>> Thanks again.
>> >>>>>
>> >>>>> Best Regards,
>> >>>>> - He Xiaoqiao
>> >>>>>
>> >>>>> On Tue, May 21, 2024 at 11:21 AM zhangjian
>> <1361320...@qq.com.invalid>
>> >>>>> wrote:
>> >>>>>
>> >>>>>> Thank you for your positive attitude towards this feature. You can
>> >> debug
>> >>>>>> the UTs provided in PR to better understand the current
>> asynchronous
>> >>>>>> calling function.
>> >>>>>>
>> >>>>>>> 2024年5月21日 02:04,Simbarashe Dzinamarira <simbadz...@apache.org>
>> 写道:
>> >>>>>>>
>> >>>>>>> Excited to see this feature as well. I'll spend more time
>> >> understanding
>> >>>>>> the
>> >>>>>>> proposal and implementation.
>> >>>>>>>
>> >>>>>>> On Mon, May 20, 2024 at 7:55 AM zhangjian <
>> 1361320...@qq.com.invalid
>> >>>
>> >>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi, Yuanbo liu,  thank you for your interest in this feature, I
>> >> think
>> >>>>>> the
>> >>>>>>>> difficulty of an asynchronous router is not only to implement
>> >>>>>> asynchronous
>> >>>>>>>> functions, but also to consider the readability and reusability
>> of
>> >> the
>> >>>>>>>> code, so as to facilitate the development of the community. I
>> also
>> >>>>>> planned
>> >>>>>>>> to do the virtual thread you mentioned at the beginning, virtual
>> >>>> Threads
>> >>>>>>>> can achieve asynchronousization elegantly at the code level, but
>> the
>> >>>>>>>> biggest problem is that it is not easy to upgrade the jdk
>> version,
>> >> no
>> >>>>>>>> matter in the community or in the actual production environment.
>> >>>>>> Therefore,
>> >>>>>>>> I later used CompletableFuture, which is currently supported by
>> jdk
>> >> 8,
>> >>>>>> to
>> >>>>>>>> achieve asynchronousization. The router is stateless, and the
>> router
>> >>>> rpc
>> >>>>>>>> process is very clear. Therefore, even if CompletableFuture
>> itself
>> >> is
>> >>>>>> not
>> >>>>>>>> as readable as the virtual thread, if we design it well, we can
>> make
>> >>>> the
>> >>>>>>>> asynchronous process look very clear.
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>> 2024年5月20日 10:56,Yuanbo Liu <liuyuanb...@gmail.com> 写道:
>> >>>>>>>>>
>> >>>>>>>>> Nice to see this feature brought up. I tried to implement this
>> >>>> feature
>> >>>>>> in
>> >>>>>>>>> our internal clusters, and know that it's a very complicated
>> >> feature,
>> >>>>>> CC
>> >>>>>>>>> hdfs-dev to bring more discussion.
>> >>>>>>>>> By the way, I'm not sure whether virtual thread of higher jdk
>> will
>> >>>> help
>> >>>>>>>> in
>> >>>>>>>>> this case.
>> >>>>>>>>>
>> >>>>>>>>> On Mon, May 20, 2024 at 10:10 AM zhangjian
>> >> <1361320...@qq.com.invalid
>> >>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>>> Hello everyone, currently there are some shortcomings in the
>> RPC
>> >> of
>> >>>>>> HDFS
>> >>>>>>>>>> router:
>> >>>>>>>>>>
>> >>>>>>>>>> Currently the router's handler thread is synchronized, when the
>> >>>>>>>> *handler* thread
>> >>>>>>>>>> adds the call to connection.calls, it needs to wait until the
>> >>>>>>>> *connection* notifies
>> >>>>>>>>>> the call to complete, and then Only after the response is put
>> into
>> >>>> the
>> >>>>>>>>>> response queue can a new call be obtained from the call queue
>> and
>> >>>>>>>>>> processed. Therefore, the concurrency performance of the
>> router is
>> >>>>>>>> limited
>> >>>>>>>>>> by the number of handlers; a simple example is as follows: If
>> the
>> >>>>>>>> number of
>> >>>>>>>>>> handlers is 1 and the maximum number of calls in the connection
>> >>>> thread
>> >>>>>>>> is
>> >>>>>>>>>> 10, then even if the connection thread can send 10 requests to
>> the
>> >>>>>>>>>> downstream ns, since the number of handlers is 1, the router
>> can
>> >>>> only
>> >>>>>>>>>> process one request after another.
>> >>>>>>>>>>
>> >>>>>>>>>> Since the performance of router rpc is mainly limited by the
>> >> number
>> >>>> of
>> >>>>>>>>>> handlers, the most effective way to improve rpc performance
>> >>>> currently
>> >>>>>>>> is to
>> >>>>>>>>>> increase the number of handlers. Letting the router create a
>> large
>> >>>>>>>> number
>> >>>>>>>>>> of handler threads will also increase the number of thread
>> >> switches
>> >>>>>> and
>> >>>>>>>>>> cannot maximize the use of machine performance.
>> >>>>>>>>>>
>> >>>>>>>>>> There are usually multiple ns downstream of the router. If the
>> >>>> handler
>> >>>>>>>>>> forwards the request to an ns with poor performance, it will
>> cause
>> >>>> the
>> >>>>>>>>>> handler to wait for a long time. Due to the reduction of
>> available
>> >>>>>>>>>> handlers, the router's ability to handle ns requests with
>> normal
>> >>>>>>>>>> performance will be reduced. From the perspective of the
>> client,
>> >> the
>> >>>>>>>>>> performance of the downstream ns of the router has
>> deteriorated at
>> >>>>>> this
>> >>>>>>>>>> time. We often find that the call queue of the downstream ns is
>> >> not
>> >>>>>>>> high,
>> >>>>>>>>>> but the call queue of the router is very high.
>> >>>>>>>>>>
>> >>>>>>>>>> Therefore, although the main function of the router is to
>> federate
>> >>>> and
>> >>>>>>>>>> handle requests from multiple NSs, the current synchronous RPC
>> >>>>>>>> performance
>> >>>>>>>>>> cannot satisfy the scenario where there are many NSs
>> downstream of
>> >>>> the
>> >>>>>>>>>> router. Even if the concurrent performance of the router can be
>> >>>>>>>> improved by
>> >>>>>>>>>> increasing the number of handlers, it is still relatively slow.
>> >> More
>> >>>>>>>>>> threads will increase the CPU context switching time, and in
>> fact
>> >>>> many
>> >>>>>>>> of
>> >>>>>>>>>> the handler threads are in a blocked state, which is
>> undoubtedly a
>> >>>>>>>> waste of
>> >>>>>>>>>> thread resources. When a request enters the router, there is no
>> >>>>>>>> guarantee
>> >>>>>>>>>> that there will be a running handler at this time.
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Therefore, I consider asynchronous router rpc. Please view the
>> >>>> issues:
>> >>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-17531  for the
>> >> complete
>> >>>>>>>>>> solution.
>> >>>>>>>>>>
>> >>>>>>>>>> And you can also view this PR:
>> >>>>>>>> https://github.com/apache/hadoop/pull/6838,
>> >>>>>>>>>> which is just a demo, but it completes the core asynchronous
>> RPC
>> >>>>>>>> function.
>> >>>>>>>>>> If you think asynchronous routing is feasible, we can consider
>> >>>>>> splitting
>> >>>>>>>>>> this PR for easy review in the future.
>> >>>>>>>>>>
>> >>>>>>>>>> The PDF is attached and can also be viewed through issues.
>> >>>>>>>>>>
>> >>>>>>>>>> Welcome everyone to exchange and discuss!
>> >>>>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >> ---------------------------------------------------------------------
>> >>>>>>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> >>>>>>>> For additional commands, e-mail:
>> common-dev-h...@hadoop.apache.org
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> ---------------------------------------------------------------------
>> >>>>>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
>> >>>>>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
>> >>>>>>
>> >>>>>>
>> >>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
>> >>>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>> >>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >
>
>
>

Reply via email to