Nice to see this feature brought up. I tried to implement this feature in
our internal clusters, and know that it's a very complicated feature, CC
hdfs-dev to bring more discussion.
By the way, I'm not sure whether virtual thread of higher jdk will help in
this case.

On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid>
wrote:

> Hello everyone, currently there are some shortcomings in the RPC of HDFS
> router:
>
> Currently the router's handler thread is synchronized, when the *handler* 
> thread
> adds the call to connection.calls, it needs to wait until the *connection* 
> notifies
> the call to complete, and then Only after the response is put into the
> response queue can a new call be obtained from the call queue and
> processed. Therefore, the concurrency performance of the router is limited
> by the number of handlers; a simple example is as follows: If the number of
> handlers is 1 and the maximum number of calls in the connection thread is
> 10, then even if the connection thread can send 10 requests to the
> downstream ns, since the number of handlers is 1, the router can only
> process one request after another.
>
> Since the performance of router rpc is mainly limited by the number of
> handlers, the most effective way to improve rpc performance currently is to
> increase the number of handlers. Letting the router create a large number
> of handler threads will also increase the number of thread switches and
> cannot maximize the use of machine performance.
>
> There are usually multiple ns downstream of the router. If the handler
> forwards the request to an ns with poor performance, it will cause the
> handler to wait for a long time. Due to the reduction of available
> handlers, the router's ability to handle ns requests with normal
> performance will be reduced. From the perspective of the client, the
> performance of the downstream ns of the router has deteriorated at this
> time. We often find that the call queue of the downstream ns is not high,
> but the call queue of the router is very high.
>
> Therefore, although the main function of the router is to federate and
> handle requests from multiple NSs, the current synchronous RPC performance
> cannot satisfy the scenario where there are many NSs downstream of the
> router. Even if the concurrent performance of the router can be improved by
> increasing the number of handlers, it is still relatively slow. More
> threads will increase the CPU context switching time, and in fact many of
> the handler threads are in a blocked state, which is undoubtedly a waste of
> thread resources. When a request enters the router, there is no guarantee
> that there will be a running handler at this time.
>
>
> Therefore, I consider asynchronous router rpc. Please view the issues:
> https://issues.apache.org/jira/browse/HDFS-17531  for the complete
> solution.
>
> And you can also view this PR: https://github.com/apache/hadoop/pull/6838,
> which is just a demo, but it completes the core asynchronous RPC function.
> If you think asynchronous routing is feasible, we can consider splitting
> this PR for easy review in the future.
>
> The PDF is attached and can also be viewed through issues.
>
> Welcome everyone to exchange and discuss!
>

Reply via email to