[jira] [Created] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2
tomscut created HDFS-16548: -- Summary: Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2 Key: HDFS-16548 URL: https://issues.apache.org/jira/browse/HDFS-16548 Project: Hadoop HDFS Issue Type: Bug Reporter: tomscut {code:java} [ERROR] Tests run: 44, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: 143.701 s <<< FAILURE! - in org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots [ERROR] testRenameMoreThanOnceAcrossSnapDirs_2(org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots) Time elapsed: 6.606 s <<< FAILURE! java.lang.AssertionError: expected:<3> but was:<1> at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.failNotEquals(Assert.java:835) at org.junit.Assert.assertEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:633) at org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots.testRenameMoreThanOnceAcrossSnapDirs_2(TestRenameWithSnapshots.java:985) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16549) Consider using volume level lock for deleting blocks
Yuanbo Liu created HDFS-16549: - Summary: Consider using volume level lock for deleting blocks Key: HDFS-16549 URL: https://issues.apache.org/jira/browse/HDFS-16549 Project: Hadoop HDFS Issue Type: Improvement Reporter: Yuanbo Liu It's great to see the implement of fine-grain lock for DN has been committed into trunk. FsDatasetImpl.invalidate is a frequent method to response the delete command from NN. How about using volume-level write lock instead of pool-level write lock to reduce the cost of write lock. cc: [~hexiaoqiao] [~Aiphag0] . Thanks for your great work! h4. [Mingxiang Li|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=Aiphag0] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/ No changes -1 overall The following subsystems voted -1: hadolint mvnsite pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.io.compress.snappy.TestSnappyCompressorDecompressor hadoop.fs.TestFileUtil hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat hadoop.hdfs.server.federation.router.TestRouterQuota hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver hadoop.hdfs.server.federation.resolver.order.TestLocalResolver hadoop.yarn.server.resourcemanager.TestClientRMService hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter hadoop.mapreduce.lib.input.TestLineRecordReader hadoop.mapred.TestLineRecordReader hadoop.mapreduce.TestMapReduceLazyOutput hadoop.mapreduce.v2.TestUberAM hadoop.mapred.gridmix.TestDistCacheEmulation hadoop.yarn.sls.TestSLSRunner hadoop.resourceestimator.solver.impl.TestLpSolver hadoop.resourceestimator.service.TestResourceEstimatorService cc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-compile-javac-root.txt [472K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-checkstyle-root.txt [14M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-patch-hadolint.txt [4.0K] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-mvnsite-root.txt [556K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-patch-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/diff-patch-shellcheck.txt [72K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/whitespace-eol.txt [12M] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-javadoc-root.txt [40K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [224K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [428K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [36K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [112K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/637/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-hs-p
[jira] [Resolved] (HDFS-16526) Add metrics for slow DataNode
[ https://issues.apache.org/jira/browse/HDFS-16526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HDFS-16526. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > Add metrics for slow DataNode > - > > Key: HDFS-16526 > URL: https://issues.apache.org/jira/browse/HDFS-16526 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Renukaprasad C >Assignee: Renukaprasad C >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: Metrics-html.png > > Time Spent: 2h 20m > Remaining Estimate: 0h > > Add some more metrics for slow datanode operations - FlushOrSync, > PacketResponder send ACK. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value
[ https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena reopened HDFS-16531: - > Avoid setReplication logging an edit record if old replication equals the new > value > --- > > Key: HDFS-16531 > URL: https://issues.apache.org/jira/browse/HDFS-16531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 1h 20m > Remaining Estimate: 0h > > I recently came across a NN log where about 800k setRep calls were made, > setting the replication from 3 to 3 - ie leaving it unchanged. > Even in a case like this, we log an edit record, an audit log, and perform > some quota checks etc. > I believe it should be possible to avoid some of the work if we check for > oldRep == newRep and jump out of the method early. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16548) Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2
[ https://issues.apache.org/jira/browse/HDFS-16548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HDFS-16548. - Resolution: Abandoned not a test issue, the prod code itself has issues, have reopened the original issue, we can chase there itself or revert the original jira > Failed unit test testRenameMoreThanOnceAcrossSnapDirs_2 > --- > > Key: HDFS-16548 > URL: https://issues.apache.org/jira/browse/HDFS-16548 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: tomscut >Priority: Major > > It seems to be related to HDFS-16531. > {code:java} > [ERROR] Tests run: 44, Failures: 6, Errors: 0, Skipped: 0, Time elapsed: > 143.701 s <<< FAILURE! - in > org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots > [ERROR] > testRenameMoreThanOnceAcrossSnapDirs_2(org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots) > Time elapsed: 6.606 s <<< FAILURE! > java.lang.AssertionError: expected:<3> but was:<1> > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at > org.apache.hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots.testRenameMoreThanOnceAcrossSnapDirs_2(TestRenameWithSnapshots.java:985) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at > org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) > at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) > at > org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) > at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) > at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) > at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) > at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) > at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) > at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) > at org.junit.runners.ParentRunner.run(ParentRunner.java:413) > at > org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273) > at > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238) > at > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159) > at > org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384) > at > org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345) > at > org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126) > at > org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16531) Avoid setReplication logging an edit record if old replication equals the new value
[ https://issues.apache.org/jira/browse/HDFS-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen O'Donnell resolved HDFS-16531. -- Resolution: Abandoned Reverted this change down the branches. Sorry for causing the issue and thanks for those who jumped in with suggestions to fix it. It was intended to be a simple optimisation, but its proving too risky to be worth it! > Avoid setReplication logging an edit record if old replication equals the new > value > --- > > Key: HDFS-16531 > URL: https://issues.apache.org/jira/browse/HDFS-16531 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Reporter: Stephen O'Donnell >Assignee: Stephen O'Donnell >Priority: Major > Labels: pull-request-available > Time Spent: 1h 20m > Remaining Estimate: 0h > > I recently came across a NN log where about 800k setRep calls were made, > setting the replication from 3 to 3 - ie leaving it unchanged. > Even in a case like this, we log an edit record, an audit log, and perform > some quota checks etc. > I believe it should be possible to avoid some of the work if we check for > oldRep == newRep and jump out of the method early. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/ [Apr 19, 2022 4:37:28 AM] (noreply) HDFS-16538. EC decoding failed due to not enough valid inputs (#4167) [Apr 19, 2022 5:35:23 AM] (noreply) HDFS-16035. Remove DummyGroupMapping as it is not longer used anywhere. (#4183) -1 overall The following subsystems voted -1: blanks pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-compile-cc-root.txt [96K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-compile-javac-root.txt [340K] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/blanks-eol.txt [13M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-checkstyle-root.txt [14M] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-shellcheck.txt [28K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/results-javadoc-javadoc-root.txt [400K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/845/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [528K] Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16550) [SBN read] Improper cache-size for journal node may cause cluster crash
tomscut created HDFS-16550: -- Summary: [SBN read] Improper cache-size for journal node may cause cluster crash Key: HDFS-16550 URL: https://issues.apache.org/jira/browse/HDFS-16550 Project: Hadoop HDFS Issue Type: Bug Reporter: tomscut Assignee: tomscut Attachments: image-2022-04-21-09-54-29-751.png, image-2022-04-21-09-54-57-111.png When we introduced SBN Read, we encountered a situation when upgrading the JournalNodes. Cluster Info: *Active: nn0* *Standby: nn1* 1. Rolling restart journal node. {color:#FF}(related config: fs.journalnode.edit-cache-size.bytes=1G, -Xms1G, -Xmx=1G){color} 2. The cluster runs for a while. 3. {color:#FF}Active namenode(nn0){color} shutdown because of Timed out waiting 12ms for a quorum of nodes to respond. 4. Transfer nn1 to Active state. 5. {color:#FF}New Active namenode(nn1){color} also shutdown because of Timed out waiting 12ms for a quorum of nodes to respond. 6. {color:#FF}The cluster crashed{color}. Related code: {code:java} JournaledEditsCache(Configuration conf) { capacity = conf.getInt(DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_KEY, DFSConfigKeys.DFS_JOURNALNODE_EDIT_CACHE_SIZE_DEFAULT); if (capacity > 0.9 * Runtime.getRuntime().maxMemory()) { Journal.LOG.warn(String.format("Cache capacity is set at %d bytes but " + "maximum JVM memory is only %d bytes. It is recommended that you " + "decrease the cache size or increase the heap size.", capacity, Runtime.getRuntime().maxMemory())); } Journal.LOG.info("Enabling the journaled edits cache with a capacity " + "of bytes: " + capacity); ReadWriteLock lock = new ReentrantReadWriteLock(true); readLock = new AutoCloseableLock(lock.readLock()); writeLock = new AutoCloseableLock(lock.writeLock()); initialize(INVALID_TXN_ID); } {code} Currently, *fs.journalNode.edit-cache-size-bytes* can be set to a larger size than the memory requested by the process. If {*}fs.journalNode.edit-cache-sie.bytes > 0.9 * Runtime.getruntime().maxMemory(){*}, only warn logs are printed during journalnode startup. This can easily be overlooked by users. However, as the cluster runs to a certain period of time, it is likely to cause the cluster to crash. !image-2022-04-21-09-54-57-111.png|width=1227,height=57! IMO, when {*}fs.journalNode.edit-cache-size-bytes > threshold * Runtime.getruntime ().maxMemory(){*}, we should throw an Exception and {color:#FF}fast fail{color}. Giving a clear hint for users to update related configurations. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16500) Make asynchronous blocks deletion lock and unlock durtion threshold configurable
[ https://issues.apache.org/jira/browse/HDFS-16500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HDFS-16500. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Target Version/s: (was: 3.3.1, 3.3.2) Resolution: Fixed Committed to trunk. Thanks [~smarthan] for your contributions. > Make asynchronous blocks deletion lock and unlock durtion threshold > configurable > - > > Key: HDFS-16500 > URL: https://issues.apache.org/jira/browse/HDFS-16500 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Reporter: Chengwei Wang >Assignee: Chengwei Wang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > I have backport the nice feature HDFS-16043 to our internal branch, it works > well in our testing cluster. > I think it's better to make the fields *_deleteBlockLockTimeMs_* and > *_deleteBlockUnlockIntervalTimeMs_* configurable, so that we can control the > lock and unlock duration. > {code:java} > private final long deleteBlockLockTimeMs = 500; > private final long deleteBlockUnlockIntervalTimeMs = 100;{code} > And we should set the default value smaller to avoid blocking other requests > long time when deleting some large directories. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16551) Backport HADOOP-17588 to 3.3 and other active old branches.
Renukaprasad C created HDFS-16551: - Summary: Backport HADOOP-17588 to 3.3 and other active old branches. Key: HDFS-16551 URL: https://issues.apache.org/jira/browse/HDFS-16551 Project: Hadoop HDFS Issue Type: Task Reporter: Renukaprasad C Assignee: Renukaprasad C This random issue has been handled in trunk, same needs to be backported to active branches. org.apache.hadoop.crypto.CryptoInputStream.close() - when 2 threads try to close the stream second thread, fails with error. This operation should be synchronized to avoid multiple threads to perform the close operation concurrently. [~Hemanth Boyina] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org