Excited to see this feature as well. I'll spend more time understanding the proposal and implementation.
On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid> wrote: > Hi, Yuanbo liu, thank you for your interest in this feature, I think the > difficulty of an asynchronous router is not only to implement asynchronous > functions, but also to consider the readability and reusability of the > code, so as to facilitate the development of the community. I also planned > to do the virtual thread you mentioned at the beginning, virtual Threads > can achieve asynchronousization elegantly at the code level, but the > biggest problem is that it is not easy to upgrade the jdk version, no > matter in the community or in the actual production environment. Therefore, > I later used CompletableFuture, which is currently supported by jdk 8, to > achieve asynchronousization. The router is stateless, and the router rpc > process is very clear. Therefore, even if CompletableFuture itself is not > as readable as the virtual thread, if we design it well, we can make the > asynchronous process look very clear. > > > > 2024年5月20日 10:56,Yuanbo Liu <liuyuanb...@gmail.com> 写道: > > > > Nice to see this feature brought up. I tried to implement this feature in > > our internal clusters, and know that it's a very complicated feature, CC > > hdfs-dev to bring more discussion. > > By the way, I'm not sure whether virtual thread of higher jdk will help > in > > this case. > > > > On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid> > > wrote: > > > >> Hello everyone, currently there are some shortcomings in the RPC of HDFS > >> router: > >> > >> Currently the router's handler thread is synchronized, when the > *handler* thread > >> adds the call to connection.calls, it needs to wait until the > *connection* notifies > >> the call to complete, and then Only after the response is put into the > >> response queue can a new call be obtained from the call queue and > >> processed. Therefore, the concurrency performance of the router is > limited > >> by the number of handlers; a simple example is as follows: If the > number of > >> handlers is 1 and the maximum number of calls in the connection thread > is > >> 10, then even if the connection thread can send 10 requests to the > >> downstream ns, since the number of handlers is 1, the router can only > >> process one request after another. > >> > >> Since the performance of router rpc is mainly limited by the number of > >> handlers, the most effective way to improve rpc performance currently > is to > >> increase the number of handlers. Letting the router create a large > number > >> of handler threads will also increase the number of thread switches and > >> cannot maximize the use of machine performance. > >> > >> There are usually multiple ns downstream of the router. If the handler > >> forwards the request to an ns with poor performance, it will cause the > >> handler to wait for a long time. Due to the reduction of available > >> handlers, the router's ability to handle ns requests with normal > >> performance will be reduced. From the perspective of the client, the > >> performance of the downstream ns of the router has deteriorated at this > >> time. We often find that the call queue of the downstream ns is not > high, > >> but the call queue of the router is very high. > >> > >> Therefore, although the main function of the router is to federate and > >> handle requests from multiple NSs, the current synchronous RPC > performance > >> cannot satisfy the scenario where there are many NSs downstream of the > >> router. Even if the concurrent performance of the router can be > improved by > >> increasing the number of handlers, it is still relatively slow. More > >> threads will increase the CPU context switching time, and in fact many > of > >> the handler threads are in a blocked state, which is undoubtedly a > waste of > >> thread resources. When a request enters the router, there is no > guarantee > >> that there will be a running handler at this time. > >> > >> > >> Therefore, I consider asynchronous router rpc. Please view the issues: > >> https://issues.apache.org/jira/browse/HDFS-17531 for the complete > >> solution. > >> > >> And you can also view this PR: > https://github.com/apache/hadoop/pull/6838, > >> which is just a demo, but it completes the core asynchronous RPC > function. > >> If you think asynchronous routing is feasible, we can consider splitting > >> this PR for easy review in the future. > >> > >> The PDF is attached and can also be viewed through issues. > >> > >> Welcome everyone to exchange and discuss! > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >