+1 for the git clean command. HDFS-7917 still might be valuable for enabling us to run a few unit tests on Windows that are currently skipped. Let's please keep it open, but it's less urgent.
Thanks! Chris Nauroth Hortonworks http://hortonworks.com/ On 3/16/15, 11:54 AM, "Colin P. McCabe" <cmcc...@apache.org> wrote: >If all it takes is someone creating a test that makes a directory >without -x, this is going to happen over and over. > >Let's just fix the problem at the root by running "git clean -fqdx" in >our jenkins scripts. If there's no objections I will add this in and >un-break the builds. > >best, >Colin > >On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <l...@cloudera.com> wrote: >> I filed HDFS-7917 to change the way to simulate disk failures. >> >> But I think we still need infrastructure folks to help with jenkins >> scripts to clean the dirs left today. >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricet...@gmail.com> wrote: >>> Any updates on this issues? It seems that all HDFS jenkins builds are >>> still failing. >>> >>> Regards, >>> Haohui >>> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B >>><vinayakum...@apache.org> wrote: >>>> I think the problem started from here. >>>> >>>> >>>>https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/juni >>>>t/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/test >>>>UnderReplicationAfterVolFailure/ >>>> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the >>>>permission. >>>> But in this patch, ReplicationMonitor got NPE and it got terminate >>>>signal, >>>> due to which MiniDFSCluster.shutdown() throwing Exception. >>>> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those >>>>permission >>>> after shutting down cluster. So in this case IMO, permissions were >>>>never >>>> restored. >>>> >>>> >>>> @After >>>> public void tearDown() throws Exception { >>>> if(data_fail != null) { >>>> FileUtil.setWritable(data_fail, true); >>>> } >>>> if(failedDir != null) { >>>> FileUtil.setWritable(failedDir, true); >>>> } >>>> if(cluster != null) { >>>> cluster.shutdown(); >>>> } >>>> for (int i = 0; i < 3; i++) { >>>> FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true); >>>> FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true); >>>> } >>>> } >>>> >>>> >>>> Regards, >>>> Vinay >>>> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B >>>><vinayakum...@apache.org> >>>> wrote: >>>> >>>>> When I see the history of these kind of builds, All these are failed >>>>>on >>>>> node H9. >>>>> >>>>> I think some or the other uncommitted patch would have created the >>>>>problem >>>>> and left it there. >>>>> >>>>> >>>>> Regards, >>>>> Vinay >>>>> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bus...@cloudera.com> >>>>>wrote: >>>>> >>>>>> You could rely on a destructive git clean call instead of maven to >>>>>>do the >>>>>> directory removal. >>>>>> >>>>>> -- >>>>>> Sean >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> >>>>>>wrote: >>>>>> >>>>>> > Is there a maven plugin or setting we can use to simply remove >>>>>> > directories that have no executable permissions on them? Clearly >>>>>>we >>>>>> > have the permission to do this from a technical point of view >>>>>>(since >>>>>> > we created the directories as the jenkins user), it's simply that >>>>>>the >>>>>> > code refuses to do it. >>>>>> > >>>>>> > Otherwise I guess we can just fix those tests... >>>>>> > >>>>>> > Colin >>>>>> > >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote: >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris. >>>>>> > > >>>>>> > > In HDFS-7722: >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in >>>>>> > TearDown(). >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally >>>>>>clause. >>>>>> > > >>>>>> > > Also I ran mvn test several times on my machine and all tests >>>>>>passed. >>>>>> > > >>>>>> > > However, since in DiskChecker#checkDirAccess(): >>>>>> > > >>>>>> > > private static void checkDirAccess(File dir) throws >>>>>> DiskErrorException { >>>>>> > > if (!dir.isDirectory()) { >>>>>> > > throw new DiskErrorException("Not a directory: " >>>>>> > > + dir.toString()); >>>>>> > > } >>>>>> > > >>>>>> > > checkAccessByFileMethods(dir); >>>>>> > > } >>>>>> > > >>>>>> > > One potentially safer alternative is replacing data dir with a >>>>>>regular >>>>>> > > file to stimulate disk failures. >>>>>> > > >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth < >>>>>> cnaur...@hortonworks.com> >>>>>> > wrote: >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >>>>>> > >> TestDataNodeVolumeFailureReporting, and >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable >>>>>>permissions >>>>>> > from >>>>>> > >> directories like the one Colin mentioned to simulate disk >>>>>>failures at >>>>>> > data >>>>>> > >> nodes. I reviewed the code for all of those, and they all >>>>>>appear to >>>>>> be >>>>>> > >> doing the necessary work to restore executable permissions at >>>>>>the >>>>>> end of >>>>>> > >> the test. The only recent uncommitted patch I¹ve seen that >>>>>>makes >>>>>> > changes >>>>>> > >> in these test suites is HDFS-7722. That patch still looks fine >>>>>> > though. I >>>>>> > >> don¹t know if there are other uncommitted patches that changed >>>>>>these >>>>>> > test >>>>>> > >> suites. >>>>>> > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process >>>>>>unexpectedly died >>>>>> > >> after removing executable permissions but before restoring >>>>>>them. >>>>>> That >>>>>> > >> always would have been a weakness of these test suites, >>>>>>regardless of >>>>>> > any >>>>>> > >> recent changes. >>>>>> > >> >>>>>> > >> Chris Nauroth >>>>>> > >> Hortonworks >>>>>> > >> http://hortonworks.com/ >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote: >>>>>> > >> >>>>>> > >>>Hey Colin, >>>>>> > >>> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's >>>>>>going on >>>>>> with >>>>>> > >>>these boxes. He took a look and concluded that some perms are >>>>>>being >>>>>> set >>>>>> > in >>>>>> > >>>those directories by our unit tests which are precluding those >>>>>>files >>>>>> > from >>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but >>>>>>we >>>>>> should >>>>>> > >>>expect this to keep happening until we can fix the test in >>>>>>question >>>>>> to >>>>>> > >>>properly clean up after itself. >>>>>> > >>> >>>>>> > >>>To help narrow down which commit it was that started this, >>>>>>Andrew >>>>>> sent >>>>>> > me >>>>>> > >>>this info: >>>>>> > >>> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >>>>>> > >>>>>> >>>>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/da >>>>>>>>>ta3/ >>>>>> > has >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way >>>>>>since >>>>>> 9:32 >>>>>> > >>>UTC >>>>>> > >>>on March 5th." >>>>>> > >>> >>>>>> > >>>-- >>>>>> > >>>Aaron T. Myers >>>>>> > >>>Software Engineer, Cloudera >>>>>> > >>> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe >>>>>><cmcc...@apache.org >>>>>> > >>>>>> > >>>wrote: >>>>>> > >>> >>>>>> > >>>> Hi all, >>>>>> > >>>> >>>>>> > >>>> A very quick (and not thorough) survey shows that I can't >>>>>>find any >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours. Most of >>>>>>them >>>>>> seem >>>>>> > >>>> to be failing with some variant of this message: >>>>>> > >>>> >>>>>> > >>>> [ERROR] Failed to execute goal >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >>>>>> (default-clean) >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to >>>>>>delete >>>>>> > >>>> >>>>>> > >>>> >>>>>> > >>>>>> > >>>>>> >>>>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop >>>>>>>>>>-hdfs-pr >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >>>>>> > >>>> -> [Help 1] >>>>>> > >>>> >>>>>> > >>>> Any ideas how this happened? Bad disk, unit test setting >>>>>>wrong >>>>>> > >>>> permissions? >>>>>> > >>>> >>>>>> > >>>> Colin >>>>>> > >>>> >>>>>> > >> >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > -- >>>>>> > > Lei (Eddy) Xu >>>>>> > > Software Engineer, Cloudera >>>>>> > >>>>>> >>>>> >>>>> >> >> >> >> -- >> Lei (Eddy) Xu >> Software Engineer, Cloudera