Kevin Wikant created HDFS-16664:
---
Summary: Use correct GenerationStamp when invalidating corrupt
block replica
Key: HDFS-16664
URL: https://issues.apache.org/jira/browse/HDFS-16664
Project: Hadoop HDFS
Issue Type: Bug
Reporter: Kevin Wikant
While trying to backport HDFS-16064 to an older Hadoop version, the new unit
test "testDeleteCorruptReplicaForUnderReplicatedBlock" started failing
unexpectedly.
Upon deep diving this unit test failure, I identified a bug in HDFS corrupt
replica invalidation which results in the following datanode exception:
{quote}2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to
localhost/127.0.0.1:61365] WARN datanode.DataNode
(BPServiceActor.java:processCommand(887)) - Error processing datanode Command
java.io.IOException: Failed to delete 1 (out of 1) replica(s):
0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched,
existing replica is blk_1073741825_1001
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
at
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
at
org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
at
org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
at java.lang.Thread.run(Thread.java:750)
{quote} * The issue is that the Namenode is sending wrong generationStamp to
the datanode. By adding some additional logs, I was able to determine the root
cause for this:
the generationStamp sent in the DNA_INVALIDATE is based on the [generationStamp
of the block sent in the block
report|https://github.com/apache/hadoop/blob/8774f178686487007dcf8c418c989b785a529000/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3733]
* the problem is that the datanode with the corrupt block replica (that
receives the DNA_INVALIDATE) is not necissarily the same datanode that sent the
block report
* this can cause the above exception when the corrupt block replica on the
datanode receiving the DNA_INVALIDATE & the block replica on the datanode that
sent the block report have different generationStamps
The solution is to store the corrupt replicas generationStamp in the
CorruptReplicasMap, then to extract this correct generationStamp value when
sending the DNA_INVALIDATE to the datanode
h2. Failed Test - Before the fix
{quote}> mvn test
-Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock
[INFO] Results:
[INFO]
[ERROR] Failures:
[ERROR] TestDecommission.testDeleteCorruptReplicaForUnderReplicatedBlock:2035
Node 127.0.0.1:61366 failed to complete decommissioning. numTrackedNodes=1 ,
numPendingNodes=0 , adminState=Decommission In Progress ,
nodesWithReplica=[127.0.0.1:61366, 127.0.0.1:61419]
{quote}
Logs:
{quote}> cat
target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt |
grep 'Expected Replicas:\|XXX\|FINALIZED\|Block now\|Failed to delete'
2022-07-16 08:07:45,891 [Listener at localhost/61378] INFO
hdfs.TestDecommission
(TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942)) -
Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1
live replica on 127.0.0.1:61366
2022-07-16 08:07:45,913 [Listener at localhost/61378] INFO
hdfs.TestDecommission
(TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974)) -
Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1
decommissioning replica on 127.0.0.1:61366
XXX invalidateBlock dn=127.0.0.1:61415 , blk=1073741825_1001
XXX postponeBlock dn=127.0.0.1:61415 , blk=1073741825_1001
XXX invalidateBlock dn=127.0.0.1:61419 , blk=1073741825_1003
XXX addToInvalidates dn=127.0.0.1:61419 , blk=1073741825_1003
XXX addBlocksToBeInvalidated dn=127.0.0.1:61419 , blk=1073741825_1003
XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005
XXX DNA_INVALIDATE dn=/127.0.0.1:61419 , blk=1073741825_1003
XXX invalidate(on DN) dn=/127.0.0.1:61419 , invalidBlk=blk_1073741825_1003 ,
blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED
2022-07-16 08:07:49,084 [BP-958471676-X-1657973243350 heartbeating to
localhost/127.0.0.1:61365] INFO impl.FsDatasetAsyncDiskService
(FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling
blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED
XXX addBlock