[jira] [Created] (HDFS-16301) Improve BenchmarkThroughput#SIZE naming standardization
JiangHua Zhu created HDFS-16301: --- Summary: Improve BenchmarkThroughput#SIZE naming standardization Key: HDFS-16301 URL: https://issues.apache.org/jira/browse/HDFS-16301 Project: Hadoop HDFS Issue Type: Improvement Components: benchmarks, test Affects Versions: 2.9.2 Reporter: JiangHua Zhu In the BenchmarkThroughput#run() method, there is a local variable: SIZE. This variable is used in a local scope, and it may be more appropriate to change it to a lowercase name. public int run(String[] args) throws IOException { .. long SIZE = conf.getLong("dfsthroughput.file.size", 10L * 1024 * 1024 * 1024); .. } -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/ [Nov 4, 2021 7:57:35 AM] (Takanobu Asanuma) HDFS-16294.Remove invalid DataNode#CONFIG_PROPERTY_SIMULATED. (#3605) -1 overall The following subsystems voted -1: asflicense hadolint mvnsite pathlen unit The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: Failed junit tests : hadoop.fs.TestFileUtil hadoop.hdfs.server.datanode.TestDataNodeUUID hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.contrib.bkjournal.TestBookKeeperHACheckpoints hadoop.hdfs.server.federation.router.TestRouterNamenodeHeartbeat hadoop.hdfs.server.federation.router.TestRouterQuota hadoop.hdfs.server.federation.resolver.order.TestLocalResolver hadoop.hdfs.server.federation.resolver.TestMultipleDestinationResolver hadoop.yarn.server.resourcemanager.monitor.invariants.TestMetricsInvariantChecker hadoop.yarn.server.resourcemanager.TestClientRMService hadoop.mapreduce.jobhistory.TestHistoryViewerPrinter hadoop.mapreduce.lib.input.TestLineRecordReader hadoop.mapred.TestLineRecordReader hadoop.tools.TestDistCpSystem hadoop.yarn.sls.TestSLSRunner hadoop.resourceestimator.service.TestResourceEstimatorService hadoop.resourceestimator.solver.impl.TestLpSolver cc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/diff-compile-javac-root.txt [496K] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/diff-checkstyle-root.txt [14M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/diff-patch-hadolint.txt [4.0K] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-mvnsite-root.txt [584K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/pathlen.txt [12K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/diff-patch-pylint.txt [48K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/diff-patch-shellcheck.txt [56K] shelldocs: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/diff-patch-shelldocs.txt [48K] whitespace: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/whitespace-eol.txt [12M] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/whitespace-tabs.txt [1.3M] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-javadoc-root.txt [32K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt [232K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [428K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs_src_contrib_bkjournal.txt [12K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt [40K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-common.txt [20K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt [128K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-core.txt [104K] https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/472/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [104K] https://ci-hadoop.apach
[jira] [Created] (HDFS-16302) RBF: RouterRpcFairnessPolicyController record requests handled by each nameservice
Janus Chow created HDFS-16302: - Summary: RBF: RouterRpcFairnessPolicyController record requests handled by each nameservice Key: HDFS-16302 URL: https://issues.apache.org/jira/browse/HDFS-16302 Project: Hadoop HDFS Issue Type: Improvement Reporter: Janus Chow Assignee: Janus Chow In HDFS-16296, we added metrics to record rejected permits for each namespace, and it would be also valuable to record the handled requests for each namespace. This ticket is to also record the handled requests by each namespace. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-16303) Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning
Kevin Wikant created HDFS-16303: --- Summary: Losing over 100 datanodes in state decommissioning results in full blockage of all datanode decommissioning Key: HDFS-16303 URL: https://issues.apache.org/jira/browse/HDFS-16303 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.3.1, 2.10.1 Reporter: Kevin Wikant ## Problem Description The HDFS Namenode class "DatanodeAdminManager" is responsible for decommissioning datanodes. As per this "hdfs-site" configuration: {quote}Config = dfs.namenode.decommission.max.concurrent.tracked.nodes Default Value = 100 The maximum number of decommission-in-progress datanodes nodes that will be tracked at one time by the namenode. Tracking a decommission-in-progress datanode consumes additional NN memory proportional to the number of blocks on the datnode. Having a conservative limit reduces the potential impact of decomissioning a large number of nodes at once. A value of 0 means no limit will be enforced. {quote} The Namenode will only actively track up to 100 datanodes for decommissioning at any given time, as to avoid Namenode memory pressure. Looking into the "DatanodeAdminManager" code: * a new datanode is only removed from the "tracked.nodes" set when it finishes decommissioning * a new datanode is only added to the "tracked.nodes" set if there is fewer than 100 datanodes being tracked So in the event that there are more than 100 datanodes being decommissioned at a given time, some of those datanodes will not be in the "tracked.nodes" set until 1 or more datanodes in the "tracked.nodes" finishes decommissioning. This is generally not a problem because the datanodes in "tracked.nodes" will eventually finish decommissioning, but there is an edge case where this logic prevents the namenode from making any forward progress towards decommissioning. If all 100 datanodes in the "tracked.nodes" are unable to finish decommissioning, then other datanodes (which may be able to be decommissioned) will never get added to "tracked.nodes" and therefore will never get the opportunity to be decommissioned. This can occur due the following issue: {quote}2021-10-21 12:39:24,048 WARN org.apache.hadoop.hdfs.server.blockmanagement.BlockManager (DatanodeAdminMonitor-0): Node W.X.Y.Z:50010 is dead while in Decommission In Progress. Cannot be safely decommissioned or be in maintenance since there is risk of reduced data durability or data loss. Either restart the failed node or force decommissioning or maintenance by removing, calling refreshNodes, then re-adding to the excludes or host config files. {quote} If a Datanode is lost while decommissioning (for example if the underlying hardware fails or is lost), then it will remain in state decommissioning forever. If 100 or more Datanodes are lost while decommissioning over the Hadoop cluster lifetime, then this is enough to completely fill up the "tracked.nodes" set. With the entire "tracked.nodes" set filled with datanodes that can never finish decommissioning, any datanodes added after this point will never be able to be decommissioned because they will never be added to the "tracked.nodes" set. In this scenario: * the "tracked.nodes" set is filled with datanodes which are lost & cannot be recovered (and can never finish decommissioning so they will never be removed from the set) * the actual live datanodes being decommissioned are enqueued waiting to enter the "tracked.nodes" set (and are stuck waiting indefinitely) This means that no progress towards decommissioning the live datanodes will be made unless the user takes the following action: {quote}Either restart the failed node or force decommissioning or maintenance by removing, calling refreshNodes, then re-adding to the excludes or host config files. {quote} Ideally, the Namenode should be able to gracefully handle scenarios where the datanodes in the "tracked.nodes" set are not making forward progress towards decommissioning while the enqueued datanodes may be able to make forward progress. ## Reproduction Steps * create a Hadoop cluster * lose (i.e. terminate the host/process forever) over 100 datanodes while the datanodes are in state decommissioning * add additional datanodes to the cluster * attempt to decommission those new datanodes & observe that they are stuck in state decommissioning forever Note that in this example each datanode, over the full history of the cluster, has a unique IP address -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/ [Nov 4, 2021 1:16:03 AM] (noreply) HDFS-16266. Add remote port information to HDFS audit log (#3538) [Nov 4, 2021 1:51:11 AM] (noreply) HDFS-16291.Make the comment of INode#ReclaimContext more standardized. (#3602) [Nov 4, 2021 4:47:41 AM] (noreply) HADOOP-17374. support listObjectV2 (#3587) [Nov 4, 2021 7:43:25 AM] (noreply) HDFS-16294.Remove invalid DataNode#CONFIG_PROPERTY_SIMULATED. (#3605) [Nov 4, 2021 2:40:37 PM] (noreply) HADOOP-17873. ABFS: Fix transient failures in ITestAbfsStreamStatistics and ITestAbfsRestOperationException (#3341) [Nov 4, 2021 4:19:11 PM] (noreply) HDFS-16300. Use libcrypto in Windows for libhdfspp (#3617) -1 overall The following subsystems voted -1: blanks compile golang mvninstall mvnsite pathlen spotbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes hadoop.hdfs.server.diskbalancer.command.TestDiskBalancerCommand hadoop.hdfs.TestHDFSFileSystemContract hadoop.fs.http.client.TestHttpFSFWithSWebhdfsFileSystem hadoop.fs.http.client.TestHttpFSFWithWebhdfsFileSystem hadoop.mapreduce.v2.TestUberAM hadoop.mapreduce.v2.TestMRJobsWithProfiler hadoop.tools.util.TestDistCpUtils hadoop.tools.mapred.lib.TestDynamicInputFormat hadoop.yarn.csi.client.TestCsiClient hadoop.tools.dynamometer.TestDynamometerInfra hadoop.tools.dynamometer.TestDynamometerInfra mvninstall: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/patch-mvninstall-root.txt [1.7M] compile: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/patch-compile-root.txt [532K] cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/patch-compile-root.txt [532K] golang: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/patch-compile-root.txt [532K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/patch-compile-root.txt [532K] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/blanks-eol.txt [13M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/results-checkstyle-root.txt [14M] mvnsite: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/patch-mvnsite-root.txt [508K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/results-shellcheck.txt [28K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/patch-javadoc-root.txt [3.8M] spotbugs: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/branch-spotbugs-hadoop-tools_hadoop-azure.txt [4.0K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/branch-spotbugs-hadoop-tools.txt [136K] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/679/artifact/out/branch-spotbugs-root.