Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2024-05-20 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/

No changes




-1 overall


The following subsystems voted -1:
asflicense hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.hdfs.TestLeaseRecovery2 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.TestDFSClientExcludedNodes 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion 
   hadoop.hdfs.TestFileLengthOnClusterRestart 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotBlocksMap 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.mapreduce.v2.app.TestRuntimeEstimators 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.yarn.sls.TestSLSRunner 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceAllocator
 
   
hadoop.yarn.server.nodemanager.containermanager.linux.resources.TestNumaResourceHandlerImpl
 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
   hadoop.yarn.server.resourcemanager.recovery.TestFSRMStateStore 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
  

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/diff-compile-javac-root.txt
  [488K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-mvnsite-root.txt
  [572K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/diff-patch-shellcheck.txt
  [72K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-javadoc-root.txt
  [36K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [220K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [456K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [36K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [16K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-app.txt
  [44K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1398/artifact/

[jira] [Created] (HDFS-17532) Allow router state store cache update to overwrite and delete in parallel

2024-05-20 Thread Felix N (Jira)
Felix N created HDFS-17532:
--

 Summary: Allow router state store cache update to overwrite and 
delete in parallel
 Key: HDFS-17532
 URL: https://issues.apache.org/jira/browse/HDFS-17532
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, rbf
Reporter: Felix N
Assignee: Felix N


Current implementation for router state store update is quite inefficient, so 
much that when routers are removed and a lot of NameNodeMembership records are 
deleted in a short burst, the deletions triggered a router safemode in our 
cluster and caused a lot of troubles.

This ticket aims to allow the overwrite part and delete part of 
org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore#overrideExpiredRecords
 to run in parallel.

See HDFS-17529 for the other half of this improvement.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [Discuss] RBF: Aynchronous router RPC.

2024-05-20 Thread zhangjian
Hi, Yuanbo liu,  thank you for your interest in this feature, I think the 
difficulty of an asynchronous router is not only to implement asynchronous 
functions, but also to consider the readability and reusability of the code, so 
as to facilitate the development of the community. I also planned to do the 
virtual thread you mentioned at the beginning, virtual Threads can achieve 
asynchronousization elegantly at the code level, but the biggest problem is 
that it is not easy to upgrade the jdk version, no matter in the community or 
in the actual production environment. Therefore, I later used 
CompletableFuture, which is currently supported by jdk 8, to achieve 
asynchronousization. The router is stateless, and the router rpc process is 
very clear. Therefore, even if CompletableFuture itself is not as readable as 
the virtual thread, if we design it well, we can make the asynchronous process 
look very clear.


> 2024年5月20日 10:56,Yuanbo Liu  写道:
> 
> Nice to see this feature brought up. I tried to implement this feature in
> our internal clusters, and know that it's a very complicated feature, CC
> hdfs-dev to bring more discussion.
> By the way, I'm not sure whether virtual thread of higher jdk will help in
> this case.
> 
> On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid>
> wrote:
> 
>> Hello everyone, currently there are some shortcomings in the RPC of HDFS
>> router:
>> 
>> Currently the router's handler thread is synchronized, when the *handler* 
>> thread
>> adds the call to connection.calls, it needs to wait until the *connection* 
>> notifies
>> the call to complete, and then Only after the response is put into the
>> response queue can a new call be obtained from the call queue and
>> processed. Therefore, the concurrency performance of the router is limited
>> by the number of handlers; a simple example is as follows: If the number of
>> handlers is 1 and the maximum number of calls in the connection thread is
>> 10, then even if the connection thread can send 10 requests to the
>> downstream ns, since the number of handlers is 1, the router can only
>> process one request after another.
>> 
>> Since the performance of router rpc is mainly limited by the number of
>> handlers, the most effective way to improve rpc performance currently is to
>> increase the number of handlers. Letting the router create a large number
>> of handler threads will also increase the number of thread switches and
>> cannot maximize the use of machine performance.
>> 
>> There are usually multiple ns downstream of the router. If the handler
>> forwards the request to an ns with poor performance, it will cause the
>> handler to wait for a long time. Due to the reduction of available
>> handlers, the router's ability to handle ns requests with normal
>> performance will be reduced. From the perspective of the client, the
>> performance of the downstream ns of the router has deteriorated at this
>> time. We often find that the call queue of the downstream ns is not high,
>> but the call queue of the router is very high.
>> 
>> Therefore, although the main function of the router is to federate and
>> handle requests from multiple NSs, the current synchronous RPC performance
>> cannot satisfy the scenario where there are many NSs downstream of the
>> router. Even if the concurrent performance of the router can be improved by
>> increasing the number of handlers, it is still relatively slow. More
>> threads will increase the CPU context switching time, and in fact many of
>> the handler threads are in a blocked state, which is undoubtedly a waste of
>> thread resources. When a request enters the router, there is no guarantee
>> that there will be a running handler at this time.
>> 
>> 
>> Therefore, I consider asynchronous router rpc. Please view the issues:
>> https://issues.apache.org/jira/browse/HDFS-17531  for the complete
>> solution.
>> 
>> And you can also view this PR: https://github.com/apache/hadoop/pull/6838,
>> which is just a demo, but it completes the core asynchronous RPC function.
>> If you think asynchronous routing is feasible, we can consider splitting
>> this PR for easy review in the future.
>> 
>> The PDF is attached and can also be viewed through issues.
>> 
>> Welcome everyone to exchange and discuss!
>> 


-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [Discuss] RBF: Aynchronous router RPC.

2024-05-20 Thread Simbarashe Dzinamarira
Excited to see this feature as well. I'll spend more time understanding the
proposal and implementation.

On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid> wrote:

> Hi, Yuanbo liu,  thank you for your interest in this feature, I think the
> difficulty of an asynchronous router is not only to implement asynchronous
> functions, but also to consider the readability and reusability of the
> code, so as to facilitate the development of the community. I also planned
> to do the virtual thread you mentioned at the beginning, virtual Threads
> can achieve asynchronousization elegantly at the code level, but the
> biggest problem is that it is not easy to upgrade the jdk version, no
> matter in the community or in the actual production environment. Therefore,
> I later used CompletableFuture, which is currently supported by jdk 8, to
> achieve asynchronousization. The router is stateless, and the router rpc
> process is very clear. Therefore, even if CompletableFuture itself is not
> as readable as the virtual thread, if we design it well, we can make the
> asynchronous process look very clear.
>
>
> > 2024年5月20日 10:56,Yuanbo Liu  写道:
> >
> > Nice to see this feature brought up. I tried to implement this feature in
> > our internal clusters, and know that it's a very complicated feature, CC
> > hdfs-dev to bring more discussion.
> > By the way, I'm not sure whether virtual thread of higher jdk will help
> in
> > this case.
> >
> > On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid>
> > wrote:
> >
> >> Hello everyone, currently there are some shortcomings in the RPC of HDFS
> >> router:
> >>
> >> Currently the router's handler thread is synchronized, when the
> *handler* thread
> >> adds the call to connection.calls, it needs to wait until the
> *connection* notifies
> >> the call to complete, and then Only after the response is put into the
> >> response queue can a new call be obtained from the call queue and
> >> processed. Therefore, the concurrency performance of the router is
> limited
> >> by the number of handlers; a simple example is as follows: If the
> number of
> >> handlers is 1 and the maximum number of calls in the connection thread
> is
> >> 10, then even if the connection thread can send 10 requests to the
> >> downstream ns, since the number of handlers is 1, the router can only
> >> process one request after another.
> >>
> >> Since the performance of router rpc is mainly limited by the number of
> >> handlers, the most effective way to improve rpc performance currently
> is to
> >> increase the number of handlers. Letting the router create a large
> number
> >> of handler threads will also increase the number of thread switches and
> >> cannot maximize the use of machine performance.
> >>
> >> There are usually multiple ns downstream of the router. If the handler
> >> forwards the request to an ns with poor performance, it will cause the
> >> handler to wait for a long time. Due to the reduction of available
> >> handlers, the router's ability to handle ns requests with normal
> >> performance will be reduced. From the perspective of the client, the
> >> performance of the downstream ns of the router has deteriorated at this
> >> time. We often find that the call queue of the downstream ns is not
> high,
> >> but the call queue of the router is very high.
> >>
> >> Therefore, although the main function of the router is to federate and
> >> handle requests from multiple NSs, the current synchronous RPC
> performance
> >> cannot satisfy the scenario where there are many NSs downstream of the
> >> router. Even if the concurrent performance of the router can be
> improved by
> >> increasing the number of handlers, it is still relatively slow. More
> >> threads will increase the CPU context switching time, and in fact many
> of
> >> the handler threads are in a blocked state, which is undoubtedly a
> waste of
> >> thread resources. When a request enters the router, there is no
> guarantee
> >> that there will be a running handler at this time.
> >>
> >>
> >> Therefore, I consider asynchronous router rpc. Please view the issues:
> >> https://issues.apache.org/jira/browse/HDFS-17531  for the complete
> >> solution.
> >>
> >> And you can also view this PR:
> https://github.com/apache/hadoop/pull/6838,
> >> which is just a demo, but it completes the core asynchronous RPC
> function.
> >> If you think asynchronous routing is feasible, we can consider splitting
> >> this PR for easy review in the future.
> >>
> >> The PDF is attached and can also be viewed through issues.
> >>
> >> Welcome everyone to exchange and discuss!
> >>
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (HDFS-17533) RBF Tests that use embedded SQL failing unit tests

2024-05-20 Thread Simbarashe Dzinamarira (Jira)
Simbarashe Dzinamarira created HDFS-17533:
-

 Summary: RBF Tests that use embedded SQL failing unit tests
 Key: HDFS-17533
 URL: https://issues.apache.org/jira/browse/HDFS-17533
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Simbarashe Dzinamarira


In the CI runs for RBF the following two tests are failing
{noformat}
[ERROR] Failures: 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl.null
[ERROR]   Run 1: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 2: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[ERROR]   Run 3: TestSQLDelegationTokenSecretManagerImpl Multiple Failures (2 
failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:TokenStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:TokenStore;drop=true
[INFO] 
[ERROR] 
org.apache.hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL.null
[ERROR]   Run 1: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 2: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true
[ERROR]   Run 3: TestStateStoreMySQL Multiple Failures (2 failures)
java.sql.SQLException: No suitable driver found for 
jdbc:derby:memory:StateStore;create=true
java.lang.RuntimeException: java.sql.SQLException: No suitable driver 
found for jdbc:derby:memory:StateStore;drop=true {noformat}
[https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6804/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt]

 

I believe the fix is first registering the driver: 
[https://dev.mysql.com/doc/connector-j/en/connector-j-usagenotes-connect-drivermanager.html]

 

[https://stackoverflow.com/questions/22384710/java-sql-sqlexception-no-suitable-driver-found-for-jdbcmysql-localhost3306]

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-05-20 Thread Haiyang Hu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu resolved HDFS-17464.
---
Fix Version/s: 3.5.0
   Resolution: Resolved

> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [Discuss] RBF: Aynchronous router RPC.

2024-05-20 Thread zhangjian
Thank you for your positive attitude towards this feature. You can debug the 
UTs provided in PR to better understand the current asynchronous calling 
function.

> 2024年5月21日 02:04,Simbarashe Dzinamarira  写道:
> 
> Excited to see this feature as well. I'll spend more time understanding the
> proposal and implementation.
> 
> On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid> wrote:
> 
>> Hi, Yuanbo liu,  thank you for your interest in this feature, I think the
>> difficulty of an asynchronous router is not only to implement asynchronous
>> functions, but also to consider the readability and reusability of the
>> code, so as to facilitate the development of the community. I also planned
>> to do the virtual thread you mentioned at the beginning, virtual Threads
>> can achieve asynchronousization elegantly at the code level, but the
>> biggest problem is that it is not easy to upgrade the jdk version, no
>> matter in the community or in the actual production environment. Therefore,
>> I later used CompletableFuture, which is currently supported by jdk 8, to
>> achieve asynchronousization. The router is stateless, and the router rpc
>> process is very clear. Therefore, even if CompletableFuture itself is not
>> as readable as the virtual thread, if we design it well, we can make the
>> asynchronous process look very clear.
>> 
>> 
>>> 2024年5月20日 10:56,Yuanbo Liu  写道:
>>> 
>>> Nice to see this feature brought up. I tried to implement this feature in
>>> our internal clusters, and know that it's a very complicated feature, CC
>>> hdfs-dev to bring more discussion.
>>> By the way, I'm not sure whether virtual thread of higher jdk will help
>> in
>>> this case.
>>> 
>>> On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid>
>>> wrote:
>>> 
 Hello everyone, currently there are some shortcomings in the RPC of HDFS
 router:
 
 Currently the router's handler thread is synchronized, when the
>> *handler* thread
 adds the call to connection.calls, it needs to wait until the
>> *connection* notifies
 the call to complete, and then Only after the response is put into the
 response queue can a new call be obtained from the call queue and
 processed. Therefore, the concurrency performance of the router is
>> limited
 by the number of handlers; a simple example is as follows: If the
>> number of
 handlers is 1 and the maximum number of calls in the connection thread
>> is
 10, then even if the connection thread can send 10 requests to the
 downstream ns, since the number of handlers is 1, the router can only
 process one request after another.
 
 Since the performance of router rpc is mainly limited by the number of
 handlers, the most effective way to improve rpc performance currently
>> is to
 increase the number of handlers. Letting the router create a large
>> number
 of handler threads will also increase the number of thread switches and
 cannot maximize the use of machine performance.
 
 There are usually multiple ns downstream of the router. If the handler
 forwards the request to an ns with poor performance, it will cause the
 handler to wait for a long time. Due to the reduction of available
 handlers, the router's ability to handle ns requests with normal
 performance will be reduced. From the perspective of the client, the
 performance of the downstream ns of the router has deteriorated at this
 time. We often find that the call queue of the downstream ns is not
>> high,
 but the call queue of the router is very high.
 
 Therefore, although the main function of the router is to federate and
 handle requests from multiple NSs, the current synchronous RPC
>> performance
 cannot satisfy the scenario where there are many NSs downstream of the
 router. Even if the concurrent performance of the router can be
>> improved by
 increasing the number of handlers, it is still relatively slow. More
 threads will increase the CPU context switching time, and in fact many
>> of
 the handler threads are in a blocked state, which is undoubtedly a
>> waste of
 thread resources. When a request enters the router, there is no
>> guarantee
 that there will be a running handler at this time.
 
 
 Therefore, I consider asynchronous router rpc. Please view the issues:
 https://issues.apache.org/jira/browse/HDFS-17531  for the complete
 solution.
 
 And you can also view this PR:
>> https://github.com/apache/hadoop/pull/6838,
 which is just a demo, but it completes the core asynchronous RPC
>> function.
 If you think asynchronous routing is feasible, we can consider splitting
 this PR for easy review in the future.
 
 The PDF is attached and can also be viewed through issues.
 
 Welcome everyone to exchange and discuss!
 
>> 
>> 
>> -

Re: [Discuss] RBF: Aynchronous router RPC.

2024-05-20 Thread Xiaoqiao He
Thanks for this great proposal!

Some questions after reviewing the design doc (sorry didn't review PR
carefully which is too large.)
1. This solution will involve RPC framework update, will it affect other
modules and how to
keep other modules off these changes.
2. Some RPC requests should be forward concurrently to all downstream NS,
will it cover
this case in this solution.
3. Considering there is one init-version implementation, did you collect
some benchmark vs
the current synchronous model of DFSRouter?
Thanks again.

Best Regards,
- He Xiaoqiao

On Tue, May 21, 2024 at 11:21 AM zhangjian <1361320...@qq.com.invalid>
wrote:

> Thank you for your positive attitude towards this feature. You can debug
> the UTs provided in PR to better understand the current asynchronous
> calling function.
>
> > 2024年5月21日 02:04,Simbarashe Dzinamarira  写道:
> >
> > Excited to see this feature as well. I'll spend more time understanding
> the
> > proposal and implementation.
> >
> > On Mon, May 20, 2024 at 7:55 AM zhangjian <1361320...@qq.com.invalid>
> wrote:
> >
> >> Hi, Yuanbo liu,  thank you for your interest in this feature, I think
> the
> >> difficulty of an asynchronous router is not only to implement
> asynchronous
> >> functions, but also to consider the readability and reusability of the
> >> code, so as to facilitate the development of the community. I also
> planned
> >> to do the virtual thread you mentioned at the beginning, virtual Threads
> >> can achieve asynchronousization elegantly at the code level, but the
> >> biggest problem is that it is not easy to upgrade the jdk version, no
> >> matter in the community or in the actual production environment.
> Therefore,
> >> I later used CompletableFuture, which is currently supported by jdk 8,
> to
> >> achieve asynchronousization. The router is stateless, and the router rpc
> >> process is very clear. Therefore, even if CompletableFuture itself is
> not
> >> as readable as the virtual thread, if we design it well, we can make the
> >> asynchronous process look very clear.
> >>
> >>
> >>> 2024年5月20日 10:56,Yuanbo Liu  写道:
> >>>
> >>> Nice to see this feature brought up. I tried to implement this feature
> in
> >>> our internal clusters, and know that it's a very complicated feature,
> CC
> >>> hdfs-dev to bring more discussion.
> >>> By the way, I'm not sure whether virtual thread of higher jdk will help
> >> in
> >>> this case.
> >>>
> >>> On Mon, May 20, 2024 at 10:10 AM zhangjian <1361320...@qq.com.invalid>
> >>> wrote:
> >>>
>  Hello everyone, currently there are some shortcomings in the RPC of
> HDFS
>  router:
> 
>  Currently the router's handler thread is synchronized, when the
> >> *handler* thread
>  adds the call to connection.calls, it needs to wait until the
> >> *connection* notifies
>  the call to complete, and then Only after the response is put into the
>  response queue can a new call be obtained from the call queue and
>  processed. Therefore, the concurrency performance of the router is
> >> limited
>  by the number of handlers; a simple example is as follows: If the
> >> number of
>  handlers is 1 and the maximum number of calls in the connection thread
> >> is
>  10, then even if the connection thread can send 10 requests to the
>  downstream ns, since the number of handlers is 1, the router can only
>  process one request after another.
> 
>  Since the performance of router rpc is mainly limited by the number of
>  handlers, the most effective way to improve rpc performance currently
> >> is to
>  increase the number of handlers. Letting the router create a large
> >> number
>  of handler threads will also increase the number of thread switches
> and
>  cannot maximize the use of machine performance.
> 
>  There are usually multiple ns downstream of the router. If the handler
>  forwards the request to an ns with poor performance, it will cause the
>  handler to wait for a long time. Due to the reduction of available
>  handlers, the router's ability to handle ns requests with normal
>  performance will be reduced. From the perspective of the client, the
>  performance of the downstream ns of the router has deteriorated at
> this
>  time. We often find that the call queue of the downstream ns is not
> >> high,
>  but the call queue of the router is very high.
> 
>  Therefore, although the main function of the router is to federate and
>  handle requests from multiple NSs, the current synchronous RPC
> >> performance
>  cannot satisfy the scenario where there are many NSs downstream of the
>  router. Even if the concurrent performance of the router can be
> >> improved by
>  increasing the number of handlers, it is still relatively slow. More
>  threads will increase the CPU context switching time, and in fact many
> >> of
>  the handler threads are in a blocked state, which is undoubtedly a
> >> 

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2024-05-20 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1588/

No changes

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org