Hi, Sangjin Lee, thank you for your attention. I will use my free time to do a performance comparison recently.
> 2024年5月22日 03:42,Sangjin Lee <sj...@apache.org> 写道: > > Thanks for the great proposal, Zhangjian. On point #3, I suspect it should > be fairly straightforward to create a small isolated synthetic test to > prove (or disprove) the benefits of this approach. By driving a controlled > amount of requests per second, you could see latency, memory, CPU, etc. > Ideally, it should show meaningful improvements without much degradation in > other metrics. Would you be able to spend some time doing that? > > Thanks, > Sangjin > > On Tue, May 21, 2024 at 5:13 AM zhangjian <1361320...@qq.com.invalid> wrote: > >> Hi, xiaoqiao he, thank you for your reply. >> >> 1.Currently, the server and client protocols within router can be >> implemented by extends existing protocols and adding asynchronous >> functionality, so it will not affect existing synchronization protocols. >> RouterClientNamenodeProtocolServerSideTranslatorPB >> RouterClientProtocolTranslatorPB >> RouterGetUserMappingsProtocolServerSideTranslatorPB >> RouterGetUserMappingsProtocolTranslatorPB >> RouterNamenodeProtocolServerSideTranslatorPB >> RouterNamenodeProtocolTranslatorPB >> RouterRefreshUserMappingsProtocolServerSideTranslatorPB >> RouterRefreshUserMappingsProtocolTranslatorPB >> >> The following issues have implemented asynchronous callbacks for >> Rpc.server, but I have not found any other modules to use related functions >> Server HADOOP-11552 HADOOP-17046 >> In the implementation of asynchronous Rpc.client, this issue is directly >> used >> Client HADOOP-13226 >> Therefore, I believe that asynchronous routers are safe for modifying the >> RPC protocol, RPC server, and client >> >> 2. Forwarding requests to multiple downstream ns, the synchronous router >> handler adds requests from multiple downstream ns to the thread pool >> (RouterRpcClient.executorService), and then waits for responses from all >> downstream ns before returning. Since threads in the thread pool also >> process rpc requests synchronously, similar to a handler, the number of >> threads in the thread pool directly affects the performance of >> invoiceConcurrent, which in turn affects the performance of the handler. >> In asynchronous router implementation, the handler calls invoiceConcurrent >> to simply convert a request into multiple requests and add them to the asyn >> handler thread pool, which can then process the next request in the call >> queue; When a connection thread of a downstream ns receives a response, it >> will hand it over to the async response for processing. The async response >> thread will determine whether it has received all responses from the >> downstream ns. If it does, it will continue to process the response. >> Otherwise, the async response thread will process the next response. The >> asynchronous router uses CompletableFuture.allOf() to implement >> asynchronous invoiceConcurrent, and the handler, async handler, async >> response, and connection thread still does not need to wait synchronously. >> In addition, synchronous routers not only have drawbacks in multi ns >> environments, but also in single downstream ns situations, it is often >> difficult to decide how many handlers to set for the router, setting it too >> much will waste thread resources, and setting it too small will not be able >> to give pressure to downstream ns; Asynchronous routers can push requests >> to downstream ns without considering how to set handlers. Asynchronous >> routers can also better connect to more downstream storage services that >> support the HDFS protocol, with better scalability. >> >> 3.Since I have not yet deployed asynchronous routers to our own cluster, >> there is no performance comparison. However, theoretically, I believe that >> asynchronous routers will occupy more memory than synchronous routers. >> However, I do not believe that it will occupy a lot, especially since we >> can control the maximum number of requests entering the router, as >> CompletableFuture is stable and widely used; In other aspects, it should be >> far superior to synchronous routers, especially in downstream scenarios >> with more ns.If anyone is interested, you can also help to make a >> performance comparison >> >>> 2024年5月21日 11:39,Xiaoqiao He <hexiaoq...@apache.org> 写道: >>> >>> Thanks for this great proposal! >>> >>> Some questions after reviewing the design doc (sorry didn't review PR >>> carefully which is too large.) >>> 1. This solution will involve RPC framework update, will it affect other >>> modules and how to >>> keep other modules off these changes. >>> 2. Some RPC requests should be forward concurrently to all downstream NS, >>> will it cover >>> this case in this solution. >>> 3. Considering there is one init-version implementation, did you collect >>> some benchmark vs >>> the current synchronous model of DFSRouter? >>> Thanks again. >>> >>> Best Regards, >>> - He Xiaoqiao >>> >>> On Tue, May 21, 2024 at 11:21 AM zhangjian <1361320...@qq.com.invalid> >>> wrote: >>> >>>> Thank you for your positive attitude towards this feature. You can debug >>>> the UTs provided in PR to better understand the current asynchronous >>>> calling function. >>>> >>>>> 2024年5月21日 02:04,Simbarashe Dzinamarira <simbadz...@apache.org> 写道: >>>>> >>>>> Excited to see this feature as well. I'll spend more time understanding >>>> the >>>>> proposal and implementation. >>>>> >>>>> On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid> >>>> wrote: >>>>> >>>>>> Hi, Yuanbo liu, thank you for your interest in this feature, I think >>>> the >>>>>> difficulty of an asynchronous router is not only to implement >>>> asynchronous >>>>>> functions, but also to consider the readability and reusability of the >>>>>> code, so as to facilitate the development of the community. I also >>>> planned >>>>>> to do the virtual thread you mentioned at the beginning, virtual >> Threads >>>>>> can achieve asynchronousization elegantly at the code level, but the >>>>>> biggest problem is that it is not easy to upgrade the jdk version, no >>>>>> matter in the community or in the actual production environment. >>>> Therefore, >>>>>> I later used CompletableFuture, which is currently supported by jdk 8, >>>> to >>>>>> achieve asynchronousization. The router is stateless, and the router >> rpc >>>>>> process is very clear. Therefore, even if CompletableFuture itself is >>>> not >>>>>> as readable as the virtual thread, if we design it well, we can make >> the >>>>>> asynchronous process look very clear. >>>>>> >>>>>> >>>>>>> 2024年5月20日 10:56,Yuanbo Liu <liuyuanb...@gmail.com> 写道: >>>>>>> >>>>>>> Nice to see this feature brought up. I tried to implement this >> feature >>>> in >>>>>>> our internal clusters, and know that it's a very complicated feature, >>>> CC >>>>>>> hdfs-dev to bring more discussion. >>>>>>> By the way, I'm not sure whether virtual thread of higher jdk will >> help >>>>>> in >>>>>>> this case. >>>>>>> >>>>>>> On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid >>> >>>>>>> wrote: >>>>>>> >>>>>>>> Hello everyone, currently there are some shortcomings in the RPC of >>>> HDFS >>>>>>>> router: >>>>>>>> >>>>>>>> Currently the router's handler thread is synchronized, when the >>>>>> *handler* thread >>>>>>>> adds the call to connection.calls, it needs to wait until the >>>>>> *connection* notifies >>>>>>>> the call to complete, and then Only after the response is put into >> the >>>>>>>> response queue can a new call be obtained from the call queue and >>>>>>>> processed. Therefore, the concurrency performance of the router is >>>>>> limited >>>>>>>> by the number of handlers; a simple example is as follows: If the >>>>>> number of >>>>>>>> handlers is 1 and the maximum number of calls in the connection >> thread >>>>>> is >>>>>>>> 10, then even if the connection thread can send 10 requests to the >>>>>>>> downstream ns, since the number of handlers is 1, the router can >> only >>>>>>>> process one request after another. >>>>>>>> >>>>>>>> Since the performance of router rpc is mainly limited by the number >> of >>>>>>>> handlers, the most effective way to improve rpc performance >> currently >>>>>> is to >>>>>>>> increase the number of handlers. Letting the router create a large >>>>>> number >>>>>>>> of handler threads will also increase the number of thread switches >>>> and >>>>>>>> cannot maximize the use of machine performance. >>>>>>>> >>>>>>>> There are usually multiple ns downstream of the router. If the >> handler >>>>>>>> forwards the request to an ns with poor performance, it will cause >> the >>>>>>>> handler to wait for a long time. Due to the reduction of available >>>>>>>> handlers, the router's ability to handle ns requests with normal >>>>>>>> performance will be reduced. From the perspective of the client, the >>>>>>>> performance of the downstream ns of the router has deteriorated at >>>> this >>>>>>>> time. We often find that the call queue of the downstream ns is not >>>>>> high, >>>>>>>> but the call queue of the router is very high. >>>>>>>> >>>>>>>> Therefore, although the main function of the router is to federate >> and >>>>>>>> handle requests from multiple NSs, the current synchronous RPC >>>>>> performance >>>>>>>> cannot satisfy the scenario where there are many NSs downstream of >> the >>>>>>>> router. Even if the concurrent performance of the router can be >>>>>> improved by >>>>>>>> increasing the number of handlers, it is still relatively slow. More >>>>>>>> threads will increase the CPU context switching time, and in fact >> many >>>>>> of >>>>>>>> the handler threads are in a blocked state, which is undoubtedly a >>>>>> waste of >>>>>>>> thread resources. When a request enters the router, there is no >>>>>> guarantee >>>>>>>> that there will be a running handler at this time. >>>>>>>> >>>>>>>> >>>>>>>> Therefore, I consider asynchronous router rpc. Please view the >> issues: >>>>>>>> https://issues.apache.org/jira/browse/HDFS-17531 for the complete >>>>>>>> solution. >>>>>>>> >>>>>>>> And you can also view this PR: >>>>>> https://github.com/apache/hadoop/pull/6838, >>>>>>>> which is just a demo, but it completes the core asynchronous RPC >>>>>> function. >>>>>>>> If you think asynchronous routing is feasible, we can consider >>>> splitting >>>>>>>> this PR for easy review in the future. >>>>>>>> >>>>>>>> The PDF is attached and can also be viewed through issues. >>>>>>>> >>>>>>>> Welcome everyone to exchange and discuss! >>>>>>>> >>>>>> >>>>>> >>>>>> --------------------------------------------------------------------- >>>>>> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >>>>>> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >>>>>> >>>>>> >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >>>> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org >>>> >>>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org