Re: [Discuss] RBF: Aynchronous router RPC.

Simbarashe Dzinamarira Mon, 20 May 2024 11:05:27 -0700

Excited to see this feature as well. I'll spend more time understanding the
proposal and implementation.


On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid> wrote:

> Hi, Yuanbo liu,  thank you for your interest in this feature, I think the
> difficulty of an asynchronous router is not only to implement asynchronous
> functions, but also to consider the readability and reusability of the
> code, so as to facilitate the development of the community. I also planned
> to do the virtual thread you mentioned at the beginning, virtual Threads
> can achieve asynchronousization elegantly at the code level, but the
> biggest problem is that it is not easy to upgrade the jdk version, no
> matter in the community or in the actual production environment. Therefore,
> I later used CompletableFuture, which is currently supported by jdk 8, to
> achieve asynchronousization. The router is stateless, and the router rpc
> process is very clear. Therefore, even if CompletableFuture itself is not
> as readable as the virtual thread, if we design it well, we can make the
> asynchronous process look very clear.
>
>
> > 2024年5月20日 10:56，Yuanbo Liu <liuyuanb...@gmail.com> 写道：
> >
> > Nice to see this feature brought up. I tried to implement this feature in
> > our internal clusters, and know that it's a very complicated feature, CC
> > hdfs-dev to bring more discussion.
> > By the way, I'm not sure whether virtual thread of higher jdk will help
> in
> > this case.
> >
> > On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid>
> > wrote:
> >
> >> Hello everyone, currently there are some shortcomings in the RPC of HDFS
> >> router：
> >>
> >> Currently the router's handler thread is synchronized, when the
> *handler* thread
> >> adds the call to connection.calls, it needs to wait until the
> *connection* notifies
> >> the call to complete, and then Only after the response is put into the
> >> response queue can a new call be obtained from the call queue and
> >> processed. Therefore, the concurrency performance of the router is
> limited
> >> by the number of handlers; a simple example is as follows: If the
> number of
> >> handlers is 1 and the maximum number of calls in the connection thread
> is
> >> 10, then even if the connection thread can send 10 requests to the
> >> downstream ns, since the number of handlers is 1, the router can only
> >> process one request after another.
> >>
> >> Since the performance of router rpc is mainly limited by the number of
> >> handlers, the most effective way to improve rpc performance currently
> is to
> >> increase the number of handlers. Letting the router create a large
> number
> >> of handler threads will also increase the number of thread switches and
> >> cannot maximize the use of machine performance.
> >>
> >> There are usually multiple ns downstream of the router. If the handler
> >> forwards the request to an ns with poor performance, it will cause the
> >> handler to wait for a long time. Due to the reduction of available
> >> handlers, the router's ability to handle ns requests with normal
> >> performance will be reduced. From the perspective of the client, the
> >> performance of the downstream ns of the router has deteriorated at this
> >> time. We often find that the call queue of the downstream ns is not
> high,
> >> but the call queue of the router is very high.
> >>
> >> Therefore, although the main function of the router is to federate and
> >> handle requests from multiple NSs, the current synchronous RPC
> performance
> >> cannot satisfy the scenario where there are many NSs downstream of the
> >> router. Even if the concurrent performance of the router can be
> improved by
> >> increasing the number of handlers, it is still relatively slow. More
> >> threads will increase the CPU context switching time, and in fact many
> of
> >> the handler threads are in a blocked state, which is undoubtedly a
> waste of
> >> thread resources. When a request enters the router, there is no
> guarantee
> >> that there will be a running handler at this time.
> >>
> >>
> >> Therefore, I consider asynchronous router rpc. Please view the issues:
> >> https://issues.apache.org/jira/browse/HDFS-17531  for the complete
> >> solution.
> >>
> >> And you can also view this PR:
> https://github.com/apache/hadoop/pull/6838,
> >> which is just a demo, but it completes the core asynchronous RPC
> function.
> >> If you think asynchronous routing is feasible, we can consider splitting
> >> this PR for easy review in the future.
> >>
> >> The PDF is attached and can also be viewed through issues.
> >>
> >> Welcome everyone to exchange and discuss!
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>

Re: [Discuss] RBF: Aynchronous router RPC.

Reply via email to