[jira] [Created] (HDFS-17179) DFSInputStream should report CorruptMetaHeaderException as corruptBlock to NameNode

2023-09-05 Thread Bryan Beaudreault (Jira)
Bryan Beaudreault created HDFS-17179:


 Summary: DFSInputStream should report CorruptMetaHeaderException 
as corruptBlock to NameNode
 Key: HDFS-17179
 URL: https://issues.apache.org/jira/browse/HDFS-17179
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Bryan Beaudreault
Assignee: Bryan Beaudreault


We've been running into some data corruption issues recently. When a 
ChecksumException is thrown, DFSInputStream correctly reports the block to the 
NameNode which triggers deletion and re-replication of the replica. It's also 
possible that we fail to even read the meta header for constructing the 
checksum. This gets thrown as CorruptMetaHeaderException which is not handled 
by DFSInputStream. We should handle this similarly to ChecksumException. See 
stacktrace:

 
{code:java}
org.apache.hadoop.hdfs.server.datanode.CorruptMetaHeaderException: The block 
meta file header is corruptat 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:133)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitReplica.(ShortCircuitReplica.java:129)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:618)
 ~[hadoop-hdfs-client-3.3.1.jar:?]  at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:545)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:786)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:723)
 ~[hadoop-hdfs-client-3.3.1.jar:?]at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:483)
 ~[hadoop-hdfs-client-3.3.1.jar:?] at 
org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:360)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:715) 
~[hadoop-hdfs-client-3.3.1.jar:?]  at 
org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1160)
 ~[hadoop-hdfs-client-3.3.1.jar:?]   at 
org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1132) 
~[hadoop-hdfs-client-3.3.1.jar:?] at 
org.apache.hadoop.hdfs.DFSInputStream$2.call(DFSInputStream.java:1128) 
~[hadoop-hdfs-client-3.3.1.jar:?] at 
java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]  at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]  
 at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) 
~[?:?]   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) 
~[?:?]   at java.lang.Thread.run(Thread.java:829) ~[?:?]Caused by: 
org.apache.hadoop.util.InvalidChecksumSizeException: The value -75 does not map 
to a valid checksum Type  at 
org.apache.hadoop.util.DataChecksum.mapByteToChecksumType(DataChecksum.java:190)
 ~[hadoop-common-3.3.1.jar:?]at 
org.apache.hadoop.util.DataChecksum.newDataChecksum(DataChecksum.java:159) 
~[hadoop-common-3.3.1.jar:?]  at 
org.apache.hadoop.hdfs.server.datanode.BlockMetadataHeader.preadHeader(BlockMetadataHeader.java:131)
 ~[hadoop-hdfs-client-3.3.1.jar:?] {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2023-09-05 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/

No changes




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

Failed junit tests :

   hadoop.mapreduce.v2.TestUberAM 
   hadoop.mapreduce.v2.TestMRJobsWithProfiler 
   hadoop.mapreduce.v2.TestMRJobs 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-compile-javac-root.txt
 [12K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/blanks-eol.txt
 [15M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-checkstyle-root.txt
 [13M]

   hadolint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-hadolint.txt
 [20K]

   pathlen:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-pathlen.txt
 [16K]

   pylint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-pylint.txt
 [20K]

   shellcheck:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-shellcheck.txt
 [24K]

   xml:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/xml.txt
 [24K]

   javadoc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/results-javadoc-javadoc-root.txt
 [244K]

   unit:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1340/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
 [72K]

Powered by Apache Yetus 0.14.0-SNAPSHOT   https://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2023-09-05 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/1141/

No changes


ERROR: File 'out/email-report.txt' does not exist

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Apache Hadoop qbt Report: trunk+JDK11 on Linux/x86_64

2023-09-05 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/547/

[Sep 3, 2023, 7:38:21 AM] (github) YARN-6537. Running RM tests against the 
Router. (#5957) Contributed by Shilun Fan.
[Sep 3, 2023, 12:08:40 PM] (github) YARN-11435. [Router] 
FederationStateStoreFacade is not reinitialized with Router conf. (#5967) 
Contributed by Shilun Fan.




-1 overall


The following subsystems voted -1:
blanks hadolint mvnsite pathlen spotbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

spotbugs :

   module:hadoop-hdfs-project/hadoop-hdfs 
   Redundant nullcheck of oldLock, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory))
 Redundant null check at DataStorage.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory))
 Redundant null check at DataStorage.java:[line 695] 
   Redundant nullcheck of metaChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long,
 FileInputStream, FileChannel, String) Redundant null check at 
MappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long,
 FileInputStream, FileChannel, String) Redundant null check at 
MappableBlockLoader.java:[line 138] 
   Redundant nullcheck of blockChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at MemoryMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at MemoryMappableBlockLoader.java:[line 75] 
   Redundant nullcheck of blockChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at NativePmemMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at NativePmemMappableBlockLoader.java:[line 85] 
   Redundant nullcheck of metaChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,,
 long, FileInputStream, FileChannel, String) Redundant null check at 
NativePmemMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,,
 long, FileInputStream, FileChannel, String) Redundant null check at 
NativePmemMappableBlockLoader.java:[line 130] 
   
org.apache.hadoop.hdfs.server.namenode.top.window.RollingWindowManager$UserCounts
  doesn't override java.util.ArrayList.equals(Object) At 
RollingWindowManager.java:At RollingWindowManager.java:[line 1] 

spotbugs :

   module:hadoop-yarn-project/hadoop-yarn 
   Redundant nullcheck of it, which is known to be non-null in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.recoverTrackerResources(LocalResourcesTracker,
 NMStateStoreService$LocalResourceTrackerState)) Redundant null check at 
ResourceLocalizationService.java:is known to be non-null in 
org

[jira] [Resolved] (HDFS-16933) A race in SerialNumberMap will cause wrong owner, group and XATTR

2023-09-05 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HDFS-16933.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> A race in SerialNumberMap will cause wrong owner, group and XATTR
> -
>
> Key: HDFS-16933
> URL: https://issues.apache.org/jira/browse/HDFS-16933
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> If namenode enables parallel fsimage loading, a race that occurs in 
> SerialNumberMap will cause wrong owner ship for INodes.
> {code:java}
> public int get(T t) {
>   if (t == null) {
> return 0;
>   }
>   Integer sn = t2i.get(t);
>   if (sn == null) {
> // Assume there are two thread with different t, such as:
> // T1 with hbase
> // T2 with hdfs
> // If T1 and T2 get the sn in the same time, they will get the same sn, 
> such as 10
> sn = current.getAndIncrement();
> if (sn > max) {
>   current.getAndDecrement();
>   throw new IllegalStateException(name + ": serial number map is full");
> }
> Integer old = t2i.putIfAbsent(t, sn);
> if (old != null) {
>   current.getAndDecrement();
>   return old;
> }
> // If T1 puts the 10->hbase to the i2t first, T2 will use 10 -> hdfs to 
> overwrite it. So it will cause that the Inodes will get a wrong owner hdfs, 
> actual it should be hbase.
> i2t.put(sn, t);
>   }
>   return sn;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org