Great. Thanks for your addendum information.

cc @Ayush Saxena <ayush...@gmail.com> @inigo...@apache.org
<inigo...@apache.org> Any more feedback for this proposal?

IMO The feature of asynchronous router RPC is a helpful improvement. For my
internal practice, it will improve the throughput of requests forward
significantly
and is very valuable to push it forward.
Thanks again and good luck!

Best Regards,
- He Xiaoqiao

On Wed, May 22, 2024 at 9:59 AM zhangjian <1361320...@qq.com> wrote:

> Hi, Sangjin Lee, thank you for your attention. I will use my free time to
> do a performance comparison recently.
>
> > 2024年5月22日 03:42,Sangjin Lee <sj...@apache.org> 写道:
> >
> > Thanks for the great proposal, Zhangjian. On point #3, I suspect it
> should
> > be fairly straightforward to create a small isolated synthetic test to
> > prove (or disprove) the benefits of this approach. By driving a
> controlled
> > amount of requests per second, you could see latency, memory, CPU, etc.
> > Ideally, it should show meaningful improvements without much degradation
> in
> > other metrics. Would you be able to spend some time doing that?
> >
> > Thanks,
> > Sangjin
> >
> > On Tue, May 21, 2024 at 5:13 AM zhangjian <1361320...@qq.com.invalid>
> wrote:
> >
> >> Hi, xiaoqiao he, thank you for your reply.
> >>
> >> 1.Currently, the server and client protocols within router can be
> >> implemented by extends existing protocols and adding asynchronous
> >> functionality, so it will not affect existing synchronization protocols.
> >> RouterClientNamenodeProtocolServerSideTranslatorPB
> >> RouterClientProtocolTranslatorPB
> >> RouterGetUserMappingsProtocolServerSideTranslatorPB
> >> RouterGetUserMappingsProtocolTranslatorPB
> >> RouterNamenodeProtocolServerSideTranslatorPB
> >> RouterNamenodeProtocolTranslatorPB
> >> RouterRefreshUserMappingsProtocolServerSideTranslatorPB
> >> RouterRefreshUserMappingsProtocolTranslatorPB
> >>
> >> The following issues have implemented asynchronous callbacks for
> >> Rpc.server, but I have not found any other modules to use related
> functions
> >> Server HADOOP-11552 HADOOP-17046
> >> In the implementation of asynchronous Rpc.client, this issue is directly
> >> used
> >> Client HADOOP-13226
> >> Therefore, I believe that asynchronous routers are safe for modifying
> the
> >> RPC protocol, RPC server, and client
> >>
> >> 2. Forwarding requests to multiple downstream ns, the synchronous router
> >> handler adds requests from multiple downstream ns to the thread pool
> >> (RouterRpcClient.executorService), and then waits for responses from all
> >> downstream ns before returning. Since threads in the thread pool also
> >> process rpc requests synchronously, similar to a handler, the number of
> >> threads in the thread pool directly affects the performance of
> >> invoiceConcurrent, which in turn affects the performance of the handler.
> >> In asynchronous router implementation, the handler calls
> invoiceConcurrent
> >> to simply convert a request into multiple requests and add them to the
> asyn
> >> handler thread pool, which can then process the next request in the call
> >> queue; When a connection thread of a downstream ns receives a response,
> it
> >> will hand it over to the async response for processing. The async
> response
> >> thread will determine whether it has received all responses from the
> >> downstream ns. If it does, it will continue to process the response.
> >> Otherwise, the async response thread will process the next response. The
> >> asynchronous router uses CompletableFuture.allOf() to implement
> >> asynchronous invoiceConcurrent, and the handler, async handler, async
> >> response, and connection thread still does not need to wait
> synchronously.
> >> In addition, synchronous routers not only have drawbacks in multi ns
> >> environments, but also in single downstream ns situations, it is often
> >> difficult to decide how many handlers to set for the router, setting it
> too
> >> much will waste thread resources, and setting it too small will not be
> able
> >> to give pressure to downstream ns; Asynchronous routers can push
> requests
> >> to downstream ns without considering how to set handlers. Asynchronous
> >> routers can also better connect to more downstream storage services that
> >> support the HDFS protocol, with better scalability.
> >>
> >> 3.Since I have not yet deployed asynchronous routers to our own cluster,
> >> there is no performance comparison. However, theoretically, I believe
> that
> >> asynchronous routers will occupy more memory than synchronous routers.
> >> However, I do not believe that it will occupy a lot, especially since we
> >> can control the maximum number of requests entering the router, as
> >> CompletableFuture is stable and widely used; In other aspects, it
> should be
> >> far superior to synchronous routers, especially in downstream scenarios
> >> with more ns.If anyone is interested, you can also help to make a
> >> performance comparison
> >>
> >>> 2024年5月21日 11:39,Xiaoqiao He <hexiaoq...@apache.org> 写道:
> >>>
> >>> Thanks for this great proposal!
> >>>
> >>> Some questions after reviewing the design doc (sorry didn't review PR
> >>> carefully which is too large.)
> >>> 1. This solution will involve RPC framework update, will it affect
> other
> >>> modules and how to
> >>> keep other modules off these changes.
> >>> 2. Some RPC requests should be forward concurrently to all downstream
> NS,
> >>> will it cover
> >>> this case in this solution.
> >>> 3. Considering there is one init-version implementation, did you
> collect
> >>> some benchmark vs
> >>> the current synchronous model of DFSRouter?
> >>> Thanks again.
> >>>
> >>> Best Regards,
> >>> - He Xiaoqiao
> >>>
> >>> On Tue, May 21, 2024 at 11:21 AM zhangjian <1361320...@qq.com.invalid>
> >>> wrote:
> >>>
> >>>> Thank you for your positive attitude towards this feature. You can
> debug
> >>>> the UTs provided in PR to better understand the current asynchronous
> >>>> calling function.
> >>>>
> >>>>> 2024年5月21日 02:04,Simbarashe Dzinamarira <simbadz...@apache.org> 写道:
> >>>>>
> >>>>> Excited to see this feature as well. I'll spend more time
> understanding
> >>>> the
> >>>>> proposal and implementation.
> >>>>>
> >>>>> On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid
> >
> >>>> wrote:
> >>>>>
> >>>>>> Hi, Yuanbo liu,  thank you for your interest in this feature, I
> think
> >>>> the
> >>>>>> difficulty of an asynchronous router is not only to implement
> >>>> asynchronous
> >>>>>> functions, but also to consider the readability and reusability of
> the
> >>>>>> code, so as to facilitate the development of the community. I also
> >>>> planned
> >>>>>> to do the virtual thread you mentioned at the beginning, virtual
> >> Threads
> >>>>>> can achieve asynchronousization elegantly at the code level, but the
> >>>>>> biggest problem is that it is not easy to upgrade the jdk version,
> no
> >>>>>> matter in the community or in the actual production environment.
> >>>> Therefore,
> >>>>>> I later used CompletableFuture, which is currently supported by jdk
> 8,
> >>>> to
> >>>>>> achieve asynchronousization. The router is stateless, and the router
> >> rpc
> >>>>>> process is very clear. Therefore, even if CompletableFuture itself
> is
> >>>> not
> >>>>>> as readable as the virtual thread, if we design it well, we can make
> >> the
> >>>>>> asynchronous process look very clear.
> >>>>>>
> >>>>>>
> >>>>>>> 2024年5月20日 10:56,Yuanbo Liu <liuyuanb...@gmail.com> 写道:
> >>>>>>>
> >>>>>>> Nice to see this feature brought up. I tried to implement this
> >> feature
> >>>> in
> >>>>>>> our internal clusters, and know that it's a very complicated
> feature,
> >>>> CC
> >>>>>>> hdfs-dev to bring more discussion.
> >>>>>>> By the way, I'm not sure whether virtual thread of higher jdk will
> >> help
> >>>>>> in
> >>>>>>> this case.
> >>>>>>>
> >>>>>>> On Mon, May 20, 2024 at 10:10 AM zhangjian
> <1361320...@qq.com.invalid
> >>>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hello everyone, currently there are some shortcomings in the RPC
> of
> >>>> HDFS
> >>>>>>>> router:
> >>>>>>>>
> >>>>>>>> Currently the router's handler thread is synchronized, when the
> >>>>>> *handler* thread
> >>>>>>>> adds the call to connection.calls, it needs to wait until the
> >>>>>> *connection* notifies
> >>>>>>>> the call to complete, and then Only after the response is put into
> >> the
> >>>>>>>> response queue can a new call be obtained from the call queue and
> >>>>>>>> processed. Therefore, the concurrency performance of the router is
> >>>>>> limited
> >>>>>>>> by the number of handlers; a simple example is as follows: If the
> >>>>>> number of
> >>>>>>>> handlers is 1 and the maximum number of calls in the connection
> >> thread
> >>>>>> is
> >>>>>>>> 10, then even if the connection thread can send 10 requests to the
> >>>>>>>> downstream ns, since the number of handlers is 1, the router can
> >> only
> >>>>>>>> process one request after another.
> >>>>>>>>
> >>>>>>>> Since the performance of router rpc is mainly limited by the
> number
> >> of
> >>>>>>>> handlers, the most effective way to improve rpc performance
> >> currently
> >>>>>> is to
> >>>>>>>> increase the number of handlers. Letting the router create a large
> >>>>>> number
> >>>>>>>> of handler threads will also increase the number of thread
> switches
> >>>> and
> >>>>>>>> cannot maximize the use of machine performance.
> >>>>>>>>
> >>>>>>>> There are usually multiple ns downstream of the router. If the
> >> handler
> >>>>>>>> forwards the request to an ns with poor performance, it will cause
> >> the
> >>>>>>>> handler to wait for a long time. Due to the reduction of available
> >>>>>>>> handlers, the router's ability to handle ns requests with normal
> >>>>>>>> performance will be reduced. From the perspective of the client,
> the
> >>>>>>>> performance of the downstream ns of the router has deteriorated at
> >>>> this
> >>>>>>>> time. We often find that the call queue of the downstream ns is
> not
> >>>>>> high,
> >>>>>>>> but the call queue of the router is very high.
> >>>>>>>>
> >>>>>>>> Therefore, although the main function of the router is to federate
> >> and
> >>>>>>>> handle requests from multiple NSs, the current synchronous RPC
> >>>>>> performance
> >>>>>>>> cannot satisfy the scenario where there are many NSs downstream of
> >> the
> >>>>>>>> router. Even if the concurrent performance of the router can be
> >>>>>> improved by
> >>>>>>>> increasing the number of handlers, it is still relatively slow.
> More
> >>>>>>>> threads will increase the CPU context switching time, and in fact
> >> many
> >>>>>> of
> >>>>>>>> the handler threads are in a blocked state, which is undoubtedly a
> >>>>>> waste of
> >>>>>>>> thread resources. When a request enters the router, there is no
> >>>>>> guarantee
> >>>>>>>> that there will be a running handler at this time.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Therefore, I consider asynchronous router rpc. Please view the
> >> issues:
> >>>>>>>> https://issues.apache.org/jira/browse/HDFS-17531  for the
> complete
> >>>>>>>> solution.
> >>>>>>>>
> >>>>>>>> And you can also view this PR:
> >>>>>> https://github.com/apache/hadoop/pull/6838,
> >>>>>>>> which is just a demo, but it completes the core asynchronous RPC
> >>>>>> function.
> >>>>>>>> If you think asynchronous routing is feasible, we can consider
> >>>> splitting
> >>>>>>>> this PR for easy review in the future.
> >>>>>>>>
> >>>>>>>> The PDF is attached and can also be viewed through issues.
> >>>>>>>>
> >>>>>>>> Welcome everyone to exchange and discuss!
> >>>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >>>>>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>>>>>
> >>>>>>
> >>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
> >>>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
> >>>>
> >>>>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
> >>
> >>
> >
>
>

Reply via email to