So, as some may have noticed, I slammed the Jenkins servers over the weekend to get some recent patch test runs in JIRA for the bug bash this week. I've had a suspicion for a while now that either the long run times of the hadoop-hdfs module unit tests (typically 2+ hours) or the hdfs tests themselves were related to the patch process directory getting removed out from underneath test-patch.
To test the hypothesis, I submitted all of the non-HDFS patches so that they were first in the queue. Let them run for a very long time. Jenkins bounced back and forth between YARN, MR, and HADOOP. No issues encounters. Added HDFS patches into the mix. BOOM. The dreaded "The patch artifact directory has been removed! “ started to appear here and there. This seems to provide some evidence that, yes, hdfs unit tests are directory or indirectly related to the failures. IMO, I think we need to take a serious look at: * splitting up the hadoop-hdfs module into multiple modules to reduce unit test run times * checking to see if the pre commit hooks in hdfs are different than the rest (I do know that the YARN bits are different and appear to have some bugs as well) * increasing the timeout for jenkins job runs FWIW, I’ve also found some minor things here and there with the rewritten test-patch.sh. JIRAs have been filed. One critical, one major and a handful of minor things.