Eric Badger created HDFS-10816:
----------------------------------

             Summary: TestComputeInvalidateWork#testDatanodeReRegistration 
fails due to race between test and replication monitor
                 Key: HDFS-10816
                 URL: https://issues.apache.org/jira/browse/HDFS-10816
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: Eric Badger
            Assignee: Eric Badger


{noformat}
java.lang.AssertionError: Expected invalidate blocks to be the number of DNs 
expected:<3> but was:<2>
        at org.junit.Assert.fail(Assert.java:88)
        at org.junit.Assert.failNotEquals(Assert.java:743)
        at org.junit.Assert.assertEquals(Assert.java:118)
        at org.junit.Assert.assertEquals(Assert.java:555)
        at 
org.apache.hadoop.hdfs.server.blockmanagement.TestComputeInvalidateWork.testDatanodeReRegistration(TestComputeInvalidateWork.java:160)
{noformat}

The test fails because of a race condition between the test and the replication 
monitor. The default replication monitor interval is 3 seconds, which is just 
about how long the test normally takes to run. The test deletes a file and then 
subsequently gets the namesystem writelock. However, if the replication monitor 
fires in between those two instructions, the test will fail as it will itself 
invalidate one of the blocks. This can be easily reproduced by removing the 
sleep() in the ReplicationMonitor's run() method in BlockManager.java, so that 
the replication monitor executes as quickly as possible and exacerbates the 
race. 

To fix the test all that needs to be done is to turn off the replication 
monitor. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to