[jira] [Created] (HDFS-17179) DFSInputStream should report CorruptMetaHeaderException as corruptBlock to NameNode
Bryan Beaudreault created HDFS-17179: Summary: DFSInputStream should report CorruptMetaHeaderException as corruptBlock to NameNode Key: HDFS-17179 URL: https://issues.apache.org/jira/browse/HDFS-17179 Project: Hadoop HDFS Issue Type: Improvement Reporter: Bryan Beaudreault Assignee: Bryan Beaudreault We've been running into some data corruption issues recently. When a ChecksumException is thrown, DFSInputStream correctly reports the block to the NameNode which triggers deletion and re-replication of the replica. It's also possible that we fail to even read the meta header for constructing the checksum. This gets thrown as CorruptMetaHeaderException which is not handled by DFSInputStream. We should handle this similarly to ChecksumException. See stacktrace: {code:java} org.apache.hadoop.hdfs.server.datanode.CorruptMetaHeaderException: The block meta file header is corruptat org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:133) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.(ShortCircuitReplica.java:129) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:618) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:545) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:786) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:723) ~[hadoop-hdfs-client-3.3.1.jar:?]at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:715) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1160) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1132) ~[hadoop-hdfs-client-3.3.1.jar:?] at org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1128) ~[hadoop-hdfs-client-3.3.1.jar:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[?:?] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[?:?] at java.lang.Thread.run(Thread.java:829) ~[?:?]Caused by: org.apache.hadoop.util.InvalidChecksumSizeException: The value -75 does not map to a valid checksum Type at org.apache.hadoop.util.DataChecksum.mapByteToChecksumType(DataChecksum.java:190) ~[hadoop-common-3.3.1.jar:?]at org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:159) ~[hadoop-common-3.3.1.jar:?] at org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:131) ~[hadoop-hdfs-client-3.3.1.jar:?] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/ No changes -1 overall The following subsystems voted -1: blanks hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml Failed junit tests : hadoop.mapreduce.v2.TestUberAM hadoop.mapreduce.v2.TestMRJobsWithProfiler hadoop.mapreduce.v2.TestMRJobs cc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-compile-cc-root.txt [96K] javac: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-compile-javac-root.txt [12K] blanks: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/blanks-eol.txt [15M] https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/blanks-tabs.txt [2.0M] checkstyle: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-checkstyle-root.txt [13M] hadolint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-hadolint.txt [20K] pathlen: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-pathlen.txt [16K] pylint: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-pylint.txt [20K] shellcheck: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-shellcheck.txt [24K] xml: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/xml.txt [24K] javadoc: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-javadoc-javadoc-root.txt [244K] unit: https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt [72K] Powered by Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1141/ No changes ERROR: File 'out/email-report.txt' does not exist - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: trunk+JDK11 on Linux/x86_64
For more details, see https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/547/ [Sep 3, 2023, 7:38:21 AM] (github) YARN-6537. Running RM tests against the Router. (#5957) Contributed by Shilun Fan. [Sep 3, 2023, 12:08:40 PM] (github) YARN-11435. [Router] FederationStateStoreFacade is not reinitialized with Router conf. (#5967) Contributed by Shilun Fan. -1 overall The following subsystems voted -1: blanks hadolint mvnsite pathlen spotbugs unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml spotbugs : module:hadoop-hdfs-project/hadoop-hdfs Redundant nullcheck of oldLock, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory)) Redundant null check at DataStorage.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory)) Redundant null check at DataStorage.java:[line 695] Redundant nullcheck of metaChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long, FileInputStream, FileChannel, String) Redundant null check at MappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long, FileInputStream, FileChannel, String) Redundant null check at MappableBlockLoader.java:[line 138] Redundant nullcheck of blockChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at MemoryMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at MemoryMappableBlockLoader.java:[line 75] Redundant nullcheck of blockChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at NativePmemMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long, FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null check at NativePmemMappableBlockLoader.java:[line 85] Redundant nullcheck of metaChannel, which is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,, long, FileInputStream, FileChannel, String) Redundant null check at NativePmemMappableBlockLoader.java:is known to be non-null in org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,, long, FileInputStream, FileChannel, String) Redundant null check at NativePmemMappableBlockLoader.java:[line 130] org.apache.hadoop.hdfs.server.namenode.top.window.RollingWindowManager$UserCounts doesn't override java.util.ArrayList.equals(Object) At RollingWindowManager.java:At RollingWindowManager.java:[line 1] spotbugs : module:hadoop-yarn-project/hadoop-yarn Redundant nullcheck of it, which is known to be non-null in org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.recoverTrackerResources(LocalResourcesTracker, NMStateStoreService$LocalResourceTrackerState)) Redundant null check at ResourceLocalizationService.java:is known to be non-null in org
[jira] [Resolved] (HDFS-16933) A race in SerialNumberMap will cause wrong owner, group and XATTR
[ https://issues.apache.org/jira/browse/HDFS-16933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HDFS-16933. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > A race in SerialNumberMap will cause wrong owner, group and XATTR > - > > Key: HDFS-16933 > URL: https://issues.apache.org/jira/browse/HDFS-16933 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > If namenode enables parallel fsimage loading, a race that occurs in > SerialNumberMap will cause wrong owner ship for INodes. > {code:java} > public int get(T t) { > if (t == null) { > return 0; > } > Integer sn = t2i.get(t); > if (sn == null) { > // Assume there are two thread with different t, such as: > // T1 with hbase > // T2 with hdfs > // If T1 and T2 get the sn in the same time, they will get the same sn, > such as 10 > sn = current.getAndIncrement(); > if (sn > max) { > current.getAndDecrement(); > throw new IllegalStateException(name + ": serial number map is full"); > } > Integer old = t2i.putIfAbsent(t, sn); > if (old != null) { > current.getAndDecrement(); > return old; > } > // If T1 puts the 10->hbase to the i2t first, T2 will use 10 -> hdfs to > overwrite it. So it will cause that the Inodes will get a wrong owner hdfs, > actual it should be hbase. > i2t.put(sn, t); > } > return sn; > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org