[Discuss] RBF: Aynchronous router RPC.

zhangjian Sun, 19 May 2024 18:22:13 -0700

Hello everyone, currently there are some shortcomings in the RPC of HDFS router：


Currently the router's handler thread is synchronized, when the handler thread 
adds the call to connection.calls, it needs to wait until the connection 
notifies the call to complete, and then Only after the response is put into the 
response queue can a new call be obtained from the call queue and processed. 
Therefore, the concurrency performance of the router is limited by the number 
of handlers; a simple example is as follows: If the number of handlers is 1 and 
the maximum number of calls in the connection thread is 10, then even if the 
connection thread can send 10 requests to the downstream ns, since the number 
of handlers is 1, the router can only process one request after another. 
 
Since the performance of router rpc is mainly limited by the number of 
handlers, the most effective way to improve rpc performance currently is to 
increase the number of handlers. Letting the router create a large number of 
handler threads will also increase the number of thread switches and cannot 
maximize the use of machine performance.
 
There are usually multiple ns downstream of the router. If the handler forwards 
the request to an ns with poor performance, it will cause the handler to wait 
for a long time. Due to the reduction of available handlers, the router's 
ability to handle ns requests with normal performance will be reduced. From the 
perspective of the client, the performance of the downstream ns of the router 
has deteriorated at this time. We often find that the call queue of the 
downstream ns is not high, but the call queue of the router is very high.
 
Therefore, although the main function of the router is to federate and handle 
requests from multiple NSs, the current synchronous RPC performance cannot 
satisfy the scenario where there are many NSs downstream of the router. Even if 
the concurrent performance of the router can be improved by increasing the 
number of handlers, it is still relatively slow. More threads will increase the 
CPU context switching time, and in fact many of the handler threads are in a 
blocked state, which is undoubtedly a waste of thread resources. When a request 
enters the router, there is no guarantee that there will be a running handler 
at this time.


Therefore, I consider asynchronous router rpc. Please view the issues:  
https://issues.apache.org/jira/browse/HDFS-17531  for the complete solution.

And you can also view this PR: https://github.com/apache/hadoop/pull/6838, 
which is just a demo, but it completes the core asynchronous RPC function. If 
you think asynchronous routing is feasible, we can consider splitting this PR 
for easy review in the future.

Welcome everyone to exchange and discuss!

[Discuss] RBF: Aynchronous router RPC.

Reply via email to