date:20220420

[jira] [Created] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2

2022-04-20 Thread tomscut (Jira)

tomscut created HDFS-16548:
--

 Summary: Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2
 Key: HDFS-16548
 URL: https://issues.apache.org/jira/browse/HDFS-16548
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut


 
{code:java}
[ERROR] Tests run: 44, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 
143.701 s <<< FAILURE! - in 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
[ERROR] 
testRenameMoreThanOnceAcrossSnapDirs_2(org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots)
  Time elapsed: 6.606 s  <<< FAILURE!
java.lang.AssertionError: expected:<3> but was:<1>
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.failNotEquals(Assert.java:835)
at org.junit.Assert.assertEquals(Assert.java:647)
at org.junit.Assert.assertEquals(Assert.java:633)
at 
org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots.testRenameMoreThanOnceAcrossSnapDirs_2(TestRenameWithSnapshots.java:985)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
at 
org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
at 
org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
at 
org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
at 
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16549) Consider using volume level lock for deleting blocks

2022-04-20 Thread Yuanbo Liu (Jira)

Yuanbo Liu created HDFS-16549:
-

 Summary: Consider using volume level lock for deleting blocks
 Key: HDFS-16549
 URL: https://issues.apache.org/jira/browse/HDFS-16549
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Yuanbo Liu


It's great to see the implement of fine-grain lock for DN has been committed 
into trunk. 

FsDatasetImpl.invalidate is a frequent method to response the delete command 
from NN. How about using volume-level write lock instead of pool-level write 
lock to reduce the cost of write lock.

cc: [~hexiaoqiao]  [~Aiphag0] . 
Thanks for your great work!
h4. [Mingxiang 
Li|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Aiphag0]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2022-04-20 Thread Apache Jenkins Server

For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/

No changes




-1 overall


The following subsystems voted -1:
hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.io.compress.snappy.TestSnappyCompressorDecompressor 
   hadoop.fs.TestFileUtil 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints 
   hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat 
   hadoop.hdfs.server.federation.router.TestRouterQuota 
   hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver 
   hadoop.hdfs.server.federation.resolver.order.TestLocalResolver 
   hadoop.yarn.server.resourcemanager.TestClientRMService 
   
hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker
 
   hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter 
   hadoop.mapreduce.lib.input.TestLineRecordReader 
   hadoop.mapred.TestLineRecordReader 
   hadoop.mapreduce.TestMapReduceLazyOutput 
   hadoop.mapreduce.v2.TestUberAM 
   hadoop.mapred.gridmix.TestDistCacheEmulation 
   hadoop.yarn.sls.TestSLSRunner 
   hadoop.resourceestimator.solver.impl.TestLpSolver 
   hadoop.resourceestimator.service.TestResourceEstimatorService 
  

   cc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-compile-javac-root.txt
  [472K]

   checkstyle:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-checkstyle-root.txt
  [14M]

   hadolint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   mvnsite:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-mvnsite-root.txt
  [556K]

   pathlen:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-patch-shellcheck.txt
  [72K]

   whitespace:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/whitespace-eol.txt
  [12M]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/whitespace-tabs.txt
  [1.3M]

   javadoc:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-javadoc-root.txt
  [40K]

   unit:

   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [224K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [428K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt
  [12K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt
  [36K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt
  [20K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [112K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [104K]
   
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-p

[jira] [Resolved] (HDFS-16526) Add metrics for slow DataNode

2022-04-20 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-16526.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Add metrics for slow DataNode
> -
>
> Key: HDFS-16526
> URL: https://issues.apache.org/jira/browse/HDFS-16526
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Renukaprasad C
>Assignee: Renukaprasad C
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: Metrics-html.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> Add some more metrics for slow datanode operations - FlushOrSync, 
> PacketResponder send ACK.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Reopened] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-20 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reopened HDFS-16531:
-

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2

2022-04-20 Thread Ayush Saxena (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-16548.
-
Resolution: Abandoned

not a test issue, the prod code itself has issues, have reopened the original 
issue, we can chase there itself or revert the original jira

> Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2
> ---
>
> Key: HDFS-16548
> URL: https://issues.apache.org/jira/browse/HDFS-16548
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: tomscut
>Priority: Major
>
> It seems to be related to HDFS-16531.
> {code:java}
> [ERROR] Tests run: 44, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 
> 143.701 s <<< FAILURE! - in 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots
> [ERROR] 
> testRenameMoreThanOnceAcrossSnapDirs_2(org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots)
>   Time elapsed: 6.606 s  <<< FAILURE!
> java.lang.AssertionError: expected:<3> but was:<1>
>   at org.junit.Assert.fail(Assert.java:89)
>   at org.junit.Assert.failNotEquals(Assert.java:835)
>   at org.junit.Assert.assertEquals(Assert.java:647)
>   at org.junit.Assert.assertEquals(Assert.java:633)
>   at 
> org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots.testRenameMoreThanOnceAcrossSnapDirs_2(TestRenameWithSnapshots.java:985)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

2022-04-20 Thread Stephen O'Donnell (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephen O'Donnell resolved HDFS-16531.
--
Resolution: Abandoned

Reverted this change down the branches. Sorry for causing the issue and thanks 
for those who jumped in with suggestions to fix it. It was intended to be a 
simple optimisation, but its proving too risky to be worth it!

> Avoid setReplication logging an edit record if old replication equals the new 
> value
> ---
>
> Key: HDFS-16531
> URL: https://issues.apache.org/jira/browse/HDFS-16531
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Reporter: Stephen O'Donnell
>Assignee: Stephen O'Donnell
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> I recently came across a NN log where about 800k setRep calls were made, 
> setting the replication from 3 to 3 - ie leaving it unchanged.
> Even in a case like this, we log an edit record, an audit log, and perform 
> some quota checks etc.
> I believe it should be possible to avoid some of the work if we check for 
> oldRep == newRep and jump out of the method early.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2022-04-20 Thread Apache Jenkins Server

For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/

[Apr 19, 2022 4:37:28 AM] (noreply) HDFS-16538. EC decoding failed due to not 
enough valid inputs (#4167)
[Apr 19, 2022 5:35:23 AM] (noreply) HDFS-16035. Remove DummyGroupMapping as it 
is not longer used anywhere. (#4183)




-1 overall


The following subsystems voted -1:
blanks pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

Failed junit tests :

   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-compile-javac-root.txt
 [340K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/blanks-eol.txt
 [13M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-checkstyle-root.txt
 [14M]

   pathlen:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-pathlen.txt
 [16K]

   pylint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-pylint.txt
 [20K]

   shellcheck:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-shellcheck.txt
 [28K]

   xml:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/xml.txt
 [24K]

   javadoc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-javadoc-javadoc-root.txt
 [400K]

   unit:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 [528K]

Powered by Apache Yetus 0.14.0-SNAPSHOT   https://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash

2022-04-20 Thread tomscut (Jira)

tomscut created HDFS-16550:
--

 Summary: [SBN read] Improper cache-size for journal node may cause 
cluster crash
 Key: HDFS-16550
 URL: https://issues.apache.org/jira/browse/HDFS-16550
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: tomscut
Assignee: tomscut
 Attachments: image-2022-04-21-09-54-29-751.png, 
image-2022-04-21-09-54-57-111.png

When we introduced SBN Read, we encountered a situation when upgrading the 
JournalNodes.

Cluster Info: 
*Active: nn0*
*Standby: nn1*

1. Rolling restart journal node. {color:#FF}(related config: 
fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color}

2. The cluster runs for a while.

3. {color:#FF}Active namenode(nn0){color} shutdown because of Timed out 
waiting 12ms for a quorum of nodes to respond.

4. Transfer nn1 to Active state.

5. {color:#FF}New Active namenode(nn1){color} also shutdown because of 
Timed out waiting 12ms for a quorum of nodes to respond.

6. {color:#FF}The cluster crashed{color}.

 

Related code:
{code:java}
JournaledEditsCache(Configuration conf) {
  capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY,
  DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT);
  if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) {
Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " +
"maximum JVM memory is only %d bytes. It is recommended that you " +
"decrease the cache size or increase the heap size.",
capacity, Runtime.getRuntime().maxMemory()));
  }
  Journal.LOG.info("Enabling the journaled edits cache with a capacity " +
  "of bytes: " + capacity);
  ReadWriteLock lock = new ReentrantReadWriteLock(true);
  readLock = new AutoCloseableLock(lock.readLock());
  writeLock = new AutoCloseableLock(lock.writeLock());
  initialize(INVALID_TXN_ID);
} {code}
Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size 
than the memory requested by the process. If 
{*}fs.journalNode.edit-cache-sie.bytes > 0.9 * 
Runtime.getruntime().maxMemory(){*}, only warn logs are printed during 
journalnode startup. This can easily be overlooked by users. However, as the 
cluster runs to a certain period of time, it is likely to cause the cluster to 
crash.

!image-2022-04-21-09-54-57-111.png|width=1227,height=57!

IMO, when {*}fs.journalNode.edit-cache-size-bytes > threshold * 
Runtime.getruntime ().maxMemory(){*}, we should throw an Exception and 
{color:#FF}fast fail{color}. Giving a clear hint for users to update 
related configurations.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16500) Make asynchronous blocks deletion lock and unlock durtion threshold configurable

2022-04-20 Thread Xiaoqiao He (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HDFS-16500.

   Fix Version/s: 3.4.0
Hadoop Flags: Reviewed
Target Version/s:   (was: 3.3.1, 3.3.2)
  Resolution: Fixed

Committed to trunk. Thanks [~smarthan] for your contributions.

> Make asynchronous blocks deletion lock and unlock durtion threshold 
> configurable 
> -
>
> Key: HDFS-16500
> URL: https://issues.apache.org/jira/browse/HDFS-16500
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namanode
>Reporter: Chengwei Wang
>Assignee: Chengwei Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> I have backport the nice feature HDFS-16043 to our internal branch, it works 
> well in our testing cluster.
> I think it's better to make the fields *_deleteBlockLockTimeMs_* and 
> *_deleteBlockUnlockIntervalTimeMs_* configurable, so that we can control the 
> lock and unlock duration.
> {code:java}
> private final long deleteBlockLockTimeMs = 500;
> private final long deleteBlockUnlockIntervalTimeMs = 100;{code}
> And we should set the default value smaller to avoid blocking other requests 
> long time when deleting some  large directories.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.

2022-04-20 Thread Renukaprasad C (Jira)

Renukaprasad C created HDFS-16551:
-

 Summary: Backport HADOOP-17588 to 3.3 and other active old 
branches.
 Key: HDFS-16551
 URL: https://issues.apache.org/jira/browse/HDFS-16551
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Renukaprasad C
Assignee: Renukaprasad C


This random issue has been handled in trunk, same needs to be backported to 
active branches.

org.apache.hadoop.crypto.CryptoInputStream.close() - when 2 threads try to 
close the stream second thread, fails with error.

This operation should be synchronized to avoid multiple threads to perform the 
close operation concurrently.

[~Hemanth Boyina] 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2

[jira] [Created] (HDFS-16549) Consider using volume level lock for deleting blocks

Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

[jira] [Resolved] (HDFS-16526) Add metrics for slow DataNode

[jira] [Reopened] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

[jira] [Resolved] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2

[jira] [Resolved] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

[jira] [Created] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash

[jira] [Resolved] (HDFS-16500) Make asynchronous blocks deletion lock and unlock durtion threshold configurable

[jira] [Created] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.

11 matches

Site Navigation

Mail list logo

Footer information