Any updates on this issues? It seems that all HDFS jenkins builds are still failing.
Regards, Haohui On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B <vinayakum...@apache.org> wrote: > I think the problem started from here. > > https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/ > > As Chris mentioned TestDataNodeVolumeFailure is changing the permission. > But in this patch, ReplicationMonitor got NPE and it got terminate signal, > due to which MiniDFSCluster.shutdown() throwing Exception. > > But, TestDataNodeVolumeFailure#teardown() is restoring those permission > after shutting down cluster. So in this case IMO, permissions were never > restored. > > > @After > public void tearDown() throws Exception { > if(data_fail != null) { > FileUtil.setWritable(data_fail, true); > } > if(failedDir != null) { > FileUtil.setWritable(failedDir, true); > } > if(cluster != null) { > cluster.shutdown(); > } > for (int i = 0; i < 3; i++) { > FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true); > FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true); > } > } > > > Regards, > Vinay > > On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vinayakum...@apache.org> > wrote: > >> When I see the history of these kind of builds, All these are failed on >> node H9. >> >> I think some or the other uncommitted patch would have created the problem >> and left it there. >> >> >> Regards, >> Vinay >> >> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bus...@cloudera.com> wrote: >> >>> You could rely on a destructive git clean call instead of maven to do the >>> directory removal. >>> >>> -- >>> Sean >>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> wrote: >>> >>> > Is there a maven plugin or setting we can use to simply remove >>> > directories that have no executable permissions on them? Clearly we >>> > have the permission to do this from a technical point of view (since >>> > we created the directories as the jenkins user), it's simply that the >>> > code refuses to do it. >>> > >>> > Otherwise I guess we can just fix those tests... >>> > >>> > Colin >>> > >>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote: >>> > > Thanks a lot for looking into HDFS-7722, Chris. >>> > > >>> > > In HDFS-7722: >>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in >>> > TearDown(). >>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause. >>> > > >>> > > Also I ran mvn test several times on my machine and all tests passed. >>> > > >>> > > However, since in DiskChecker#checkDirAccess(): >>> > > >>> > > private static void checkDirAccess(File dir) throws >>> DiskErrorException { >>> > > if (!dir.isDirectory()) { >>> > > throw new DiskErrorException("Not a directory: " >>> > > + dir.toString()); >>> > > } >>> > > >>> > > checkAccessByFileMethods(dir); >>> > > } >>> > > >>> > > One potentially safer alternative is replacing data dir with a regular >>> > > file to stimulate disk failures. >>> > > >>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth < >>> cnaur...@hortonworks.com> >>> > wrote: >>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >>> > >> TestDataNodeVolumeFailureReporting, and >>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions >>> > from >>> > >> directories like the one Colin mentioned to simulate disk failures at >>> > data >>> > >> nodes. I reviewed the code for all of those, and they all appear to >>> be >>> > >> doing the necessary work to restore executable permissions at the >>> end of >>> > >> the test. The only recent uncommitted patch I¹ve seen that makes >>> > changes >>> > >> in these test suites is HDFS-7722. That patch still looks fine >>> > though. I >>> > >> don¹t know if there are other uncommitted patches that changed these >>> > test >>> > >> suites. >>> > >> >>> > >> I suppose it¹s also possible that the JUnit process unexpectedly died >>> > >> after removing executable permissions but before restoring them. >>> That >>> > >> always would have been a weakness of these test suites, regardless of >>> > any >>> > >> recent changes. >>> > >> >>> > >> Chris Nauroth >>> > >> Hortonworks >>> > >> http://hortonworks.com/ >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> >>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote: >>> > >> >>> > >>>Hey Colin, >>> > >>> >>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on >>> with >>> > >>>these boxes. He took a look and concluded that some perms are being >>> set >>> > in >>> > >>>those directories by our unit tests which are precluding those files >>> > from >>> > >>>getting deleted. He's going to clean up the boxes for us, but we >>> should >>> > >>>expect this to keep happening until we can fix the test in question >>> to >>> > >>>properly clean up after itself. >>> > >>> >>> > >>>To help narrow down which commit it was that started this, Andrew >>> sent >>> > me >>> > >>>this info: >>> > >>> >>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >>> > >>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ >>> > has >>> > >>>500 perms, so I'm guessing that's the problem. Been that way since >>> 9:32 >>> > >>>UTC >>> > >>>on March 5th." >>> > >>> >>> > >>>-- >>> > >>>Aaron T. Myers >>> > >>>Software Engineer, Cloudera >>> > >>> >>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmcc...@apache.org >>> > >>> > >>>wrote: >>> > >>> >>> > >>>> Hi all, >>> > >>>> >>> > >>>> A very quick (and not thorough) survey shows that I can't find any >>> > >>>> jenkins jobs that succeeded from the last 24 hours. Most of them >>> seem >>> > >>>> to be failing with some variant of this message: >>> > >>>> >>> > >>>> [ERROR] Failed to execute goal >>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >>> (default-clean) >>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete >>> > >>>> >>> > >>>> >>> > >>> > >>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr >>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >>> > >>>> -> [Help 1] >>> > >>>> >>> > >>>> Any ideas how this happened? Bad disk, unit test setting wrong >>> > >>>> permissions? >>> > >>>> >>> > >>>> Colin >>> > >>>> >>> > >> >>> > > >>> > > >>> > > >>> > > -- >>> > > Lei (Eddy) Xu >>> > > Software Engineer, Cloudera >>> > >>> >> >>