Hi, Yuanbo liu, thank you for your interest in this feature, I think the difficulty of an asynchronous router is not only to implement asynchronous functions, but also to consider the readability and reusability of the code, so as to facilitate the development of the community. I also planned to do the virtual thread you mentioned at the beginning, virtual Threads can achieve asynchronousization elegantly at the code level, but the biggest problem is that it is not easy to upgrade the jdk version, no matter in the community or in the actual production environment. Therefore, I later used CompletableFuture, which is currently supported by jdk 8, to achieve asynchronousization. The router is stateless, and the router rpc process is very clear. Therefore, even if CompletableFuture itself is not as readable as the virtual thread, if we design it well, we can make the asynchronous process look very clear.
> 2024年5月20日 10:56,Yuanbo Liu <liuyuanb...@gmail.com> 写道: > > Nice to see this feature brought up. I tried to implement this feature in > our internal clusters, and know that it's a very complicated feature, CC > hdfs-dev to bring more discussion. > By the way, I'm not sure whether virtual thread of higher jdk will help in > this case. > > On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid> > wrote: > >> Hello everyone, currently there are some shortcomings in the RPC of HDFS >> router: >> >> Currently the router's handler thread is synchronized, when the *handler* >> thread >> adds the call to connection.calls, it needs to wait until the *connection* >> notifies >> the call to complete, and then Only after the response is put into the >> response queue can a new call be obtained from the call queue and >> processed. Therefore, the concurrency performance of the router is limited >> by the number of handlers; a simple example is as follows: If the number of >> handlers is 1 and the maximum number of calls in the connection thread is >> 10, then even if the connection thread can send 10 requests to the >> downstream ns, since the number of handlers is 1, the router can only >> process one request after another. >> >> Since the performance of router rpc is mainly limited by the number of >> handlers, the most effective way to improve rpc performance currently is to >> increase the number of handlers. Letting the router create a large number >> of handler threads will also increase the number of thread switches and >> cannot maximize the use of machine performance. >> >> There are usually multiple ns downstream of the router. If the handler >> forwards the request to an ns with poor performance, it will cause the >> handler to wait for a long time. Due to the reduction of available >> handlers, the router's ability to handle ns requests with normal >> performance will be reduced. From the perspective of the client, the >> performance of the downstream ns of the router has deteriorated at this >> time. We often find that the call queue of the downstream ns is not high, >> but the call queue of the router is very high. >> >> Therefore, although the main function of the router is to federate and >> handle requests from multiple NSs, the current synchronous RPC performance >> cannot satisfy the scenario where there are many NSs downstream of the >> router. Even if the concurrent performance of the router can be improved by >> increasing the number of handlers, it is still relatively slow. More >> threads will increase the CPU context switching time, and in fact many of >> the handler threads are in a blocked state, which is undoubtedly a waste of >> thread resources. When a request enters the router, there is no guarantee >> that there will be a running handler at this time. >> >> >> Therefore, I consider asynchronous router rpc. Please view the issues: >> https://issues.apache.org/jira/browse/HDFS-17531 for the complete >> solution. >> >> And you can also view this PR: https://github.com/apache/hadoop/pull/6838, >> which is just a demo, but it completes the core asynchronous RPC function. >> If you think asynchronous routing is feasible, we can consider splitting >> this PR for easy review in the future. >> >> The PDF is attached and can also be viewed through issues. >> >> Welcome everyone to exchange and discuss! >> --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org