I think the problem started from here. https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/
As Chris mentioned TestDataNodeVolumeFailure is changing the permission. But in this patch, ReplicationMonitor got NPE and it got terminate signal, due to which MiniDFSCluster.shutdown() throwing Exception. But, TestDataNodeVolumeFailure#teardown() is restoring those permission after shutting down cluster. So in this case IMO, permissions were never restored. @After public void tearDown() throws Exception { if(data_fail != null) { FileUtil.setWritable(data_fail, true); } if(failedDir != null) { FileUtil.setWritable(failedDir, true); } if(cluster != null) { cluster.shutdown(); } for (int i = 0; i < 3; i++) { FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true); FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true); } } Regards, Vinay On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vinayakum...@apache.org> wrote: > When I see the history of these kind of builds, All these are failed on > node H9. > > I think some or the other uncommitted patch would have created the problem > and left it there. > > > Regards, > Vinay > > On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bus...@cloudera.com> wrote: > >> You could rely on a destructive git clean call instead of maven to do the >> directory removal. >> >> -- >> Sean >> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> wrote: >> >> > Is there a maven plugin or setting we can use to simply remove >> > directories that have no executable permissions on them? Clearly we >> > have the permission to do this from a technical point of view (since >> > we created the directories as the jenkins user), it's simply that the >> > code refuses to do it. >> > >> > Otherwise I guess we can just fix those tests... >> > >> > Colin >> > >> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote: >> > > Thanks a lot for looking into HDFS-7722, Chris. >> > > >> > > In HDFS-7722: >> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in >> > TearDown(). >> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause. >> > > >> > > Also I ran mvn test several times on my machine and all tests passed. >> > > >> > > However, since in DiskChecker#checkDirAccess(): >> > > >> > > private static void checkDirAccess(File dir) throws >> DiskErrorException { >> > > if (!dir.isDirectory()) { >> > > throw new DiskErrorException("Not a directory: " >> > > + dir.toString()); >> > > } >> > > >> > > checkAccessByFileMethods(dir); >> > > } >> > > >> > > One potentially safer alternative is replacing data dir with a regular >> > > file to stimulate disk failures. >> > > >> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth < >> cnaur...@hortonworks.com> >> > wrote: >> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >> > >> TestDataNodeVolumeFailureReporting, and >> > >> TestDataNodeVolumeFailureToleration all remove executable permissions >> > from >> > >> directories like the one Colin mentioned to simulate disk failures at >> > data >> > >> nodes. I reviewed the code for all of those, and they all appear to >> be >> > >> doing the necessary work to restore executable permissions at the >> end of >> > >> the test. The only recent uncommitted patch I¹ve seen that makes >> > changes >> > >> in these test suites is HDFS-7722. That patch still looks fine >> > though. I >> > >> don¹t know if there are other uncommitted patches that changed these >> > test >> > >> suites. >> > >> >> > >> I suppose it¹s also possible that the JUnit process unexpectedly died >> > >> after removing executable permissions but before restoring them. >> That >> > >> always would have been a weakness of these test suites, regardless of >> > any >> > >> recent changes. >> > >> >> > >> Chris Nauroth >> > >> Hortonworks >> > >> http://hortonworks.com/ >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote: >> > >> >> > >>>Hey Colin, >> > >>> >> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on >> with >> > >>>these boxes. He took a look and concluded that some perms are being >> set >> > in >> > >>>those directories by our unit tests which are precluding those files >> > from >> > >>>getting deleted. He's going to clean up the boxes for us, but we >> should >> > >>>expect this to keep happening until we can fix the test in question >> to >> > >>>properly clean up after itself. >> > >>> >> > >>>To help narrow down which commit it was that started this, Andrew >> sent >> > me >> > >>>this info: >> > >>> >> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >> > >> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ >> > has >> > >>>500 perms, so I'm guessing that's the problem. Been that way since >> 9:32 >> > >>>UTC >> > >>>on March 5th." >> > >>> >> > >>>-- >> > >>>Aaron T. Myers >> > >>>Software Engineer, Cloudera >> > >>> >> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmcc...@apache.org >> > >> > >>>wrote: >> > >>> >> > >>>> Hi all, >> > >>>> >> > >>>> A very quick (and not thorough) survey shows that I can't find any >> > >>>> jenkins jobs that succeeded from the last 24 hours. Most of them >> seem >> > >>>> to be failing with some variant of this message: >> > >>>> >> > >>>> [ERROR] Failed to execute goal >> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >> (default-clean) >> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete >> > >>>> >> > >>>> >> > >> > >> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr >> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >> > >>>> -> [Help 1] >> > >>>> >> > >>>> Any ideas how this happened? Bad disk, unit test setting wrong >> > >>>> permissions? >> > >>>> >> > >>>> Colin >> > >>>> >> > >> >> > > >> > > >> > > >> > > -- >> > > Lei (Eddy) Xu >> > > Software Engineer, Cloudera >> > >> > >