Thank you for your positive attitude towards this feature. You can debug the UTs provided in PR to better understand the current asynchronous calling function.
> 2024年5月21日 02:04,Simbarashe Dzinamarira <simbadz...@apache.org> 写道: > > Excited to see this feature as well. I'll spend more time understanding the > proposal and implementation. > > On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid> wrote: > >> Hi, Yuanbo liu, thank you for your interest in this feature, I think the >> difficulty of an asynchronous router is not only to implement asynchronous >> functions, but also to consider the readability and reusability of the >> code, so as to facilitate the development of the community. I also planned >> to do the virtual thread you mentioned at the beginning, virtual Threads >> can achieve asynchronousization elegantly at the code level, but the >> biggest problem is that it is not easy to upgrade the jdk version, no >> matter in the community or in the actual production environment. Therefore, >> I later used CompletableFuture, which is currently supported by jdk 8, to >> achieve asynchronousization. The router is stateless, and the router rpc >> process is very clear. Therefore, even if CompletableFuture itself is not >> as readable as the virtual thread, if we design it well, we can make the >> asynchronous process look very clear. >> >> >>> 2024年5月20日 10:56,Yuanbo Liu <liuyuanb...@gmail.com> 写道: >>> >>> Nice to see this feature brought up. I tried to implement this feature in >>> our internal clusters, and know that it's a very complicated feature, CC >>> hdfs-dev to bring more discussion. >>> By the way, I'm not sure whether virtual thread of higher jdk will help >> in >>> this case. >>> >>> On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid> >>> wrote: >>> >>>> Hello everyone, currently there are some shortcomings in the RPC of HDFS >>>> router: >>>> >>>> Currently the router's handler thread is synchronized, when the >> *handler* thread >>>> adds the call to connection.calls, it needs to wait until the >> *connection* notifies >>>> the call to complete, and then Only after the response is put into the >>>> response queue can a new call be obtained from the call queue and >>>> processed. Therefore, the concurrency performance of the router is >> limited >>>> by the number of handlers; a simple example is as follows: If the >> number of >>>> handlers is 1 and the maximum number of calls in the connection thread >> is >>>> 10, then even if the connection thread can send 10 requests to the >>>> downstream ns, since the number of handlers is 1, the router can only >>>> process one request after another. >>>> >>>> Since the performance of router rpc is mainly limited by the number of >>>> handlers, the most effective way to improve rpc performance currently >> is to >>>> increase the number of handlers. Letting the router create a large >> number >>>> of handler threads will also increase the number of thread switches and >>>> cannot maximize the use of machine performance. >>>> >>>> There are usually multiple ns downstream of the router. If the handler >>>> forwards the request to an ns with poor performance, it will cause the >>>> handler to wait for a long time. Due to the reduction of available >>>> handlers, the router's ability to handle ns requests with normal >>>> performance will be reduced. From the perspective of the client, the >>>> performance of the downstream ns of the router has deteriorated at this >>>> time. We often find that the call queue of the downstream ns is not >> high, >>>> but the call queue of the router is very high. >>>> >>>> Therefore, although the main function of the router is to federate and >>>> handle requests from multiple NSs, the current synchronous RPC >> performance >>>> cannot satisfy the scenario where there are many NSs downstream of the >>>> router. Even if the concurrent performance of the router can be >> improved by >>>> increasing the number of handlers, it is still relatively slow. More >>>> threads will increase the CPU context switching time, and in fact many >> of >>>> the handler threads are in a blocked state, which is undoubtedly a >> waste of >>>> thread resources. When a request enters the router, there is no >> guarantee >>>> that there will be a running handler at this time. >>>> >>>> >>>> Therefore, I consider asynchronous router rpc. Please view the issues: >>>> https://issues.apache.org/jira/browse/HDFS-17531 for the complete >>>> solution. >>>> >>>> And you can also view this PR: >> https://github.com/apache/hadoop/pull/6838, >>>> which is just a demo, but it completes the core asynchronous RPC >> function. >>>> If you think asynchronous routing is feasible, we can consider splitting >>>> this PR for easy review in the future. >>>> >>>> The PDF is attached and can also be viewed through issues. >>>> >>>> Welcome everyone to exchange and discuss! >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org