Sounds good. Thanks for sharing your findings. On Sat, May 25, 2024 at 2:24 AM zhangjian <1361320...@qq.com> wrote:
> Hello everyone, I conducted a performance comparison test between sync and > asynchronous router, and the test results showed that in single ns or multi > ns scenarios, Asynchronous router in terms of throughput The utilization of > CPU and thread, as well as the average processing time of client requests, > are better than those of sync router, especially when downstream ns have > performance bottlenecks, The performance of the async router is far greater > than that of the sync router; And in terms of isolation, Asynchronous > router is also better than sync router. > Detailed testing PDF: https://issues.apache.org/jira/browse/HDFS-17531 > Comparison of Async router & sync router performance.pdf > > 2024年5月24日 14:13,Yuanbo Liu <liuyuanb...@gmail.com> 写道: > > good job! > > On Fri, May 24, 2024 at 1:57 AM zhangjian <1361320...@qq.com> wrote: > >> Hello everyone, currently, I have tested the performance of async and >> sync router for a downstream ns: >> 1. The throughput, CPU, and thread performance of the async router are >> better than those of the sync router, and its memory performance is within >> an acceptable range compared to the synchronous router. >> 2. Asynchronous router can apply pressure downstream to better utilize >> the performance of downstream ns, and can almost fill the call queue of >> downstream ns. >> >> Due to the large size of the test result pdf, it cannot be sent via email, >> >> please see: https://issues.apache.org/jira/browse/HDFS-17531 >> >> > 2024年5月23日 17:03,Xiaoqiao He <hexiaoq...@apache.org> 写道: >> > >> > Great. Thanks for your addendum information. >> > >> > cc @Ayush Saxena <ayush...@gmail.com> @inigo...@apache.org >> > <inigo...@apache.org> Any more feedback for this proposal? >> > >> > IMO The feature of asynchronous router RPC is a helpful improvement. >> For my >> > internal practice, it will improve the throughput of requests forward >> > significantly >> > and is very valuable to push it forward. >> > Thanks again and good luck! >> > >> > Best Regards, >> > - He Xiaoqiao >> > >> > On Wed, May 22, 2024 at 9:59 AM zhangjian <1361320...@qq.com> wrote: >> > >> >> Hi, Sangjin Lee, thank you for your attention. I will use my free time >> to >> >> do a performance comparison recently. >> >> >> >>> 2024年5月22日 03:42,Sangjin Lee <sj...@apache.org> 写道: >> >>> >> >>> Thanks for the great proposal, Zhangjian. On point #3, I suspect it >> >> should >> >>> be fairly straightforward to create a small isolated synthetic test to >> >>> prove (or disprove) the benefits of this approach. By driving a >> >> controlled >> >>> amount of requests per second, you could see latency, memory, CPU, >> etc. >> >>> Ideally, it should show meaningful improvements without much >> degradation >> >> in >> >>> other metrics. Would you be able to spend some time doing that? >> >>> >> >>> Thanks, >> >>> Sangjin >> >>> >> >>> On Tue, May 21, 2024 at 5:13 AM zhangjian <1361320...@qq.com.invalid> >> >> wrote: >> >>> >> >>>> Hi, xiaoqiao he, thank you for your reply. >> >>>> >> >>>> 1.Currently, the server and client protocols within router can be >> >>>> implemented by extends existing protocols and adding asynchronous >> >>>> functionality, so it will not affect existing synchronization >> protocols. >> >>>> RouterClientNamenodeProtocolServerSideTranslatorPB >> >>>> RouterClientProtocolTranslatorPB >> >>>> RouterGetUserMappingsProtocolServerSideTranslatorPB >> >>>> RouterGetUserMappingsProtocolTranslatorPB >> >>>> RouterNamenodeProtocolServerSideTranslatorPB >> >>>> RouterNamenodeProtocolTranslatorPB >> >>>> RouterRefreshUserMappingsProtocolServerSideTranslatorPB >> >>>> RouterRefreshUserMappingsProtocolTranslatorPB >> >>>> >> >>>> The following issues have implemented asynchronous callbacks for >> >>>> Rpc.server, but I have not found any other modules to use related >> >> functions >> >>>> Server HADOOP-11552 HADOOP-17046 >> >>>> In the implementation of asynchronous Rpc.client, this issue is >> directly >> >>>> used >> >>>> Client HADOOP-13226 >> >>>> Therefore, I believe that asynchronous routers are safe for modifying >> >> the >> >>>> RPC protocol, RPC server, and client >> >>>> >> >>>> 2. Forwarding requests to multiple downstream ns, the synchronous >> router >> >>>> handler adds requests from multiple downstream ns to the thread pool >> >>>> (RouterRpcClient.executorService), and then waits for responses from >> all >> >>>> downstream ns before returning. Since threads in the thread pool also >> >>>> process rpc requests synchronously, similar to a handler, the number >> of >> >>>> threads in the thread pool directly affects the performance of >> >>>> invoiceConcurrent, which in turn affects the performance of the >> handler. >> >>>> In asynchronous router implementation, the handler calls >> >> invoiceConcurrent >> >>>> to simply convert a request into multiple requests and add them to >> the >> >> asyn >> >>>> handler thread pool, which can then process the next request in the >> call >> >>>> queue; When a connection thread of a downstream ns receives a >> response, >> >> it >> >>>> will hand it over to the async response for processing. The async >> >> response >> >>>> thread will determine whether it has received all responses from the >> >>>> downstream ns. If it does, it will continue to process the response. >> >>>> Otherwise, the async response thread will process the next response. >> The >> >>>> asynchronous router uses CompletableFuture.allOf() to implement >> >>>> asynchronous invoiceConcurrent, and the handler, async handler, async >> >>>> response, and connection thread still does not need to wait >> >> synchronously. >> >>>> In addition, synchronous routers not only have drawbacks in multi ns >> >>>> environments, but also in single downstream ns situations, it is >> often >> >>>> difficult to decide how many handlers to set for the router, setting >> it >> >> too >> >>>> much will waste thread resources, and setting it too small will not >> be >> >> able >> >>>> to give pressure to downstream ns; Asynchronous routers can push >> >> requests >> >>>> to downstream ns without considering how to set handlers. >> Asynchronous >> >>>> routers can also better connect to more downstream storage services >> that >> >>>> support the HDFS protocol, with better scalability. >> >>>> >> >>>> 3.Since I have not yet deployed asynchronous routers to our own >> cluster, >> >>>> there is no performance comparison. However, theoretically, I believe >> >> that >> >>>> asynchronous routers will occupy more memory than synchronous >> routers. >> >>>> However, I do not believe that it will occupy a lot, especially >> since we >> >>>> can control the maximum number of requests entering the router, as >> >>>> CompletableFuture is stable and widely used; In other aspects, it >> >> should be >> >>>> far superior to synchronous routers, especially in downstream >> scenarios >> >>>> with more ns.If anyone is interested, you can also help to make a >> >>>> performance comparison >> >>>> >> >>>>> 2024年5月21日 11:39,Xiaoqiao He <hexiaoq...@apache.org> 写道: >> >>>>> >> >>>>> Thanks for this great proposal! >> >>>>> >> >>>>> Some questions after reviewing the design doc (sorry didn't review >> PR >> >>>>> carefully which is too large.) >> >>>>> 1. This solution will involve RPC framework update, will it affect >> >> other >> >>>>> modules and how to >> >>>>> keep other modules off these changes. >> >>>>> 2. Some RPC requests should be forward concurrently to all >> downstream >> >> NS, >> >>>>> will it cover >> >>>>> this case in this solution. >> >>>>> 3. Considering there is one init-version implementation, did you >> >> collect >> >>>>> some benchmark vs >> >>>>> the current synchronous model of DFSRouter? >> >>>>> Thanks again. >> >>>>> >> >>>>> Best Regards, >> >>>>> - He Xiaoqiao >> >>>>> >> >>>>> On Tue, May 21, 2024 at 11:21 AM zhangjian >> <1361320...@qq.com.invalid> >> >>>>> wrote: >> >>>>> >> >>>>>> Thank you for your positive attitude towards this feature. You can >> >> debug >> >>>>>> the UTs provided in PR to better understand the current >> asynchronous >> >>>>>> calling function. >> >>>>>> >> >>>>>>> 2024年5月21日 02:04,Simbarashe Dzinamarira <simbadz...@apache.org> >> 写道: >> >>>>>>> >> >>>>>>> Excited to see this feature as well. I'll spend more time >> >> understanding >> >>>>>> the >> >>>>>>> proposal and implementation. >> >>>>>>> >> >>>>>>> On Mon, May 20, 2024 at 7:55 AM zhangjian < >> 1361320...@qq.com.invalid >> >>> >> >>>>>> wrote: >> >>>>>>> >> >>>>>>>> Hi, Yuanbo liu, thank you for your interest in this feature, I >> >> think >> >>>>>> the >> >>>>>>>> difficulty of an asynchronous router is not only to implement >> >>>>>> asynchronous >> >>>>>>>> functions, but also to consider the readability and reusability >> of >> >> the >> >>>>>>>> code, so as to facilitate the development of the community. I >> also >> >>>>>> planned >> >>>>>>>> to do the virtual thread you mentioned at the beginning, virtual >> >>>> Threads >> >>>>>>>> can achieve asynchronousization elegantly at the code level, but >> the >> >>>>>>>> biggest problem is that it is not easy to upgrade the jdk >> version, >> >> no >> >>>>>>>> matter in the community or in the actual production environment. >> >>>>>> Therefore, >> >>>>>>>> I later used CompletableFuture, which is currently supported by >> jdk >> >> 8, >> >>>>>> to >> >>>>>>>> achieve asynchronousization. The router is stateless, and the >> router >> >>>> rpc >> >>>>>>>> process is very clear. Therefore, even if CompletableFuture >> itself >> >> is >> >>>>>> not >> >>>>>>>> as readable as the virtual thread, if we design it well, we can >> make >> >>>> the >> >>>>>>>> asynchronous process look very clear. >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> 2024年5月20日 10:56,Yuanbo Liu <liuyuanb...@gmail.com> 写道: >> >>>>>>>>> >> >>>>>>>>> Nice to see this feature brought up. I tried to implement this >> >>>> feature >> >>>>>> in >> >>>>>>>>> our internal clusters, and know that it's a very complicated >> >> feature, >> >>>>>> CC >> >>>>>>>>> hdfs-dev to bring more discussion. >> >>>>>>>>> By the way, I'm not sure whether virtual thread of higher jdk >> will >> >>>> help >> >>>>>>>> in >> >>>>>>>>> this case. >> >>>>>>>>> >> >>>>>>>>> On Mon, May 20, 2024 at 10:10 AM zhangjian >> >> <1361320...@qq.com.invalid >> >>>>> >> >>>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> Hello everyone, currently there are some shortcomings in the >> RPC >> >> of >> >>>>>> HDFS >> >>>>>>>>>> router: >> >>>>>>>>>> >> >>>>>>>>>> Currently the router's handler thread is synchronized, when the >> >>>>>>>> *handler* thread >> >>>>>>>>>> adds the call to connection.calls, it needs to wait until the >> >>>>>>>> *connection* notifies >> >>>>>>>>>> the call to complete, and then Only after the response is put >> into >> >>>> the >> >>>>>>>>>> response queue can a new call be obtained from the call queue >> and >> >>>>>>>>>> processed. Therefore, the concurrency performance of the >> router is >> >>>>>>>> limited >> >>>>>>>>>> by the number of handlers; a simple example is as follows: If >> the >> >>>>>>>> number of >> >>>>>>>>>> handlers is 1 and the maximum number of calls in the connection >> >>>> thread >> >>>>>>>> is >> >>>>>>>>>> 10, then even if the connection thread can send 10 requests to >> the >> >>>>>>>>>> downstream ns, since the number of handlers is 1, the router >> can >> >>>> only >> >>>>>>>>>> process one request after another. >> >>>>>>>>>> >> >>>>>>>>>> Since the performance of router rpc is mainly limited by the >> >> number >> >>>> of >> >>>>>>>>>> handlers, the most effective way to improve rpc performance >> >>>> currently >> >>>>>>>> is to >> >>>>>>>>>> increase the number of handlers. Letting the router create a >> large >> >>>>>>>> number >> >>>>>>>>>> of handler threads will also increase the number of thread >> >> switches >> >>>>>> and >> >>>>>>>>>> cannot maximize the use of machine performance. >> >>>>>>>>>> >> >>>>>>>>>> There are usually multiple ns downstream of the router. If the >> >>>> handler >> >>>>>>>>>> forwards the request to an ns with poor performance, it will >> cause >> >>>> the >> >>>>>>>>>> handler to wait for a long time. Due to the reduction of >> available >> >>>>>>>>>> handlers, the router's ability to handle ns requests with >> normal >> >>>>>>>>>> performance will be reduced. From the perspective of the >> client, >> >> the >> >>>>>>>>>> performance of the downstream ns of the router has >> deteriorated at >> >>>>>> this >> >>>>>>>>>> time. We often find that the call queue of the downstream ns is >> >> not >> >>>>>>>> high, >> >>>>>>>>>> but the call queue of the router is very high. >> >>>>>>>>>> >> >>>>>>>>>> Therefore, although the main function of the router is to >> federate >> >>>> and >> >>>>>>>>>> handle requests from multiple NSs, the current synchronous RPC >> >>>>>>>> performance >> >>>>>>>>>> cannot satisfy the scenario where there are many NSs >> downstream of >> >>>> the >> >>>>>>>>>> router. Even if the concurrent performance of the router can be >> >>>>>>>> improved by >> >>>>>>>>>> increasing the number of handlers, it is still relatively slow. >> >> More >> >>>>>>>>>> threads will increase the CPU context switching time, and in >> fact >> >>>> many >> >>>>>>>> of >> >>>>>>>>>> the handler threads are in a blocked state, which is >> undoubtedly a >> >>>>>>>> waste of >> >>>>>>>>>> thread resources. When a request enters the router, there is no >> >>>>>>>> guarantee >> >>>>>>>>>> that there will be a running handler at this time. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> Therefore, I consider asynchronous router rpc. Please view the >> >>>> issues: >> >>>>>>>>>> https://issues.apache.org/jira/browse/HDFS-17531 for the >> >> complete >> >>>>>>>>>> solution. >> >>>>>>>>>> >> >>>>>>>>>> And you can also view this PR: >> >>>>>>>> https://github.com/apache/hadoop/pull/6838, >> >>>>>>>>>> which is just a demo, but it completes the core asynchronous >> RPC >> >>>>>>>> function. >> >>>>>>>>>> If you think asynchronous routing is feasible, we can consider >> >>>>>> splitting >> >>>>>>>>>> this PR for easy review in the future. >> >>>>>>>>>> >> >>>>>>>>>> The PDF is attached and can also be viewed through issues. >> >>>>>>>>>> >> >>>>>>>>>> Welcome everyone to exchange and discuss! >> >>>>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >> --------------------------------------------------------------------- >> >>>>>>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> >>>>>>>> For additional commands, e-mail: >> common-dev-h...@hadoop.apache.org >> >>>>>>>> >> >>>>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> --------------------------------------------------------------------- >> >>>>>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >> >>>>>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >> >>>>>> >> >>>>>> >> >>>> >> >>>> >> >>>> --------------------------------------------------------------------- >> >>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> >>>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >> >>>> >> >>>> >> >>> >> >> >> >> >> > > > >