Is the simulation just removing the executable bit on the directory? I'd like to get something I can reproduce locally.
On Tue, Mar 17, 2015 at 2:29 AM, Vinayakumar B <vinayakum...@apache.org> wrote: > I have simulated the problem in my env and verified that, both 'git clean > -xdf' and 'mvn clean' will not remove the directory. > mvn fails where as git simply ignores (not even display any warning) the > problem. > > > > Regards, > Vinay > > On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bus...@cloudera.com> wrote: > > > Can someone point me to an example build that is broken? > > > > On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bus...@cloudera.com> > wrote: > > > > > I'm on it. HADOOP-11721 > > > > > > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <whe...@apache.org> wrote: > > > > > >> +1 for git clean. > > >> > > >> Colin, can you please get it in ASAP? Currently due to the jenkins > > >> issues, we cannot close the 2.7 blockers. > > >> > > >> Thanks, > > >> Haohui > > >> > > >> > > >> > > >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe <cmcc...@apache.org > > > > >> wrote: > > >> > If all it takes is someone creating a test that makes a directory > > >> > without -x, this is going to happen over and over. > > >> > > > >> > Let's just fix the problem at the root by running "git clean -fqdx" > in > > >> > our jenkins scripts. If there's no objections I will add this in > and > > >> > un-break the builds. > > >> > > > >> > best, > > >> > Colin > > >> > > > >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <l...@cloudera.com> wrote: > > >> >> I filed HDFS-7917 to change the way to simulate disk failures. > > >> >> > > >> >> But I think we still need infrastructure folks to help with jenkins > > >> >> scripts to clean the dirs left today. > > >> >> > > >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricet...@gmail.com> > > >> wrote: > > >> >>> Any updates on this issues? It seems that all HDFS jenkins builds > > are > > >> >>> still failing. > > >> >>> > > >> >>> Regards, > > >> >>> Haohui > > >> >>> > > >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B < > > >> vinayakum...@apache.org> wrote: > > >> >>>> I think the problem started from here. > > >> >>>> > > >> >>>> > > >> > > > https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/ > > >> >>>> > > >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the > > >> permission. > > >> >>>> But in this patch, ReplicationMonitor got NPE and it got > terminate > > >> signal, > > >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception. > > >> >>>> > > >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those > > >> permission > > >> >>>> after shutting down cluster. So in this case IMO, permissions > were > > >> never > > >> >>>> restored. > > >> >>>> > > >> >>>> > > >> >>>> @After > > >> >>>> public void tearDown() throws Exception { > > >> >>>> if(data_fail != null) { > > >> >>>> FileUtil.setWritable(data_fail, true); > > >> >>>> } > > >> >>>> if(failedDir != null) { > > >> >>>> FileUtil.setWritable(failedDir, true); > > >> >>>> } > > >> >>>> if(cluster != null) { > > >> >>>> cluster.shutdown(); > > >> >>>> } > > >> >>>> for (int i = 0; i < 3; i++) { > > >> >>>> FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), > > >> true); > > >> >>>> FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), > > >> true); > > >> >>>> } > > >> >>>> } > > >> >>>> > > >> >>>> > > >> >>>> Regards, > > >> >>>> Vinay > > >> >>>> > > >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B < > > >> vinayakum...@apache.org> > > >> >>>> wrote: > > >> >>>> > > >> >>>>> When I see the history of these kind of builds, All these are > > >> failed on > > >> >>>>> node H9. > > >> >>>>> > > >> >>>>> I think some or the other uncommitted patch would have created > the > > >> problem > > >> >>>>> and left it there. > > >> >>>>> > > >> >>>>> > > >> >>>>> Regards, > > >> >>>>> Vinay > > >> >>>>> > > >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey < > bus...@cloudera.com > > > > > >> wrote: > > >> >>>>> > > >> >>>>>> You could rely on a destructive git clean call instead of maven > > to > > >> do the > > >> >>>>>> directory removal. > > >> >>>>>> > > >> >>>>>> -- > > >> >>>>>> Sean > > >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" < > cmcc...@alumni.cmu.edu> > > >> wrote: > > >> >>>>>> > > >> >>>>>> > Is there a maven plugin or setting we can use to simply > remove > > >> >>>>>> > directories that have no executable permissions on them? > > >> Clearly we > > >> >>>>>> > have the permission to do this from a technical point of view > > >> (since > > >> >>>>>> > we created the directories as the jenkins user), it's simply > > >> that the > > >> >>>>>> > code refuses to do it. > > >> >>>>>> > > > >> >>>>>> > Otherwise I guess we can just fix those tests... > > >> >>>>>> > > > >> >>>>>> > Colin > > >> >>>>>> > > > >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> > > >> wrote: > > >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris. > > >> >>>>>> > > > > >> >>>>>> > > In HDFS-7722: > > >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir > permissions > > >> in > > >> >>>>>> > TearDown(). > > >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally > > >> clause. > > >> >>>>>> > > > > >> >>>>>> > > Also I ran mvn test several times on my machine and all > tests > > >> passed. > > >> >>>>>> > > > > >> >>>>>> > > However, since in DiskChecker#checkDirAccess(): > > >> >>>>>> > > > > >> >>>>>> > > private static void checkDirAccess(File dir) throws > > >> >>>>>> DiskErrorException { > > >> >>>>>> > > if (!dir.isDirectory()) { > > >> >>>>>> > > throw new DiskErrorException("Not a directory: " > > >> >>>>>> > > + dir.toString()); > > >> >>>>>> > > } > > >> >>>>>> > > > > >> >>>>>> > > checkAccessByFileMethods(dir); > > >> >>>>>> > > } > > >> >>>>>> > > > > >> >>>>>> > > One potentially safer alternative is replacing data dir > with > > a > > >> regular > > >> >>>>>> > > file to stimulate disk failures. > > >> >>>>>> > > > > >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth < > > >> >>>>>> cnaur...@hortonworks.com> > > >> >>>>>> > wrote: > > >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, > > >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and > > >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable > > >> permissions > > >> >>>>>> > from > > >> >>>>>> > >> directories like the one Colin mentioned to simulate disk > > >> failures at > > >> >>>>>> > data > > >> >>>>>> > >> nodes. I reviewed the code for all of those, and they all > > >> appear to > > >> >>>>>> be > > >> >>>>>> > >> doing the necessary work to restore executable permissions > > at > > >> the > > >> >>>>>> end of > > >> >>>>>> > >> the test. The only recent uncommitted patch I¹ve seen > that > > >> makes > > >> >>>>>> > changes > > >> >>>>>> > >> in these test suites is HDFS-7722. That patch still looks > > >> fine > > >> >>>>>> > though. I > > >> >>>>>> > >> don¹t know if there are other uncommitted patches that > > >> changed these > > >> >>>>>> > test > > >> >>>>>> > >> suites. > > >> >>>>>> > >> > > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process > > >> unexpectedly died > > >> >>>>>> > >> after removing executable permissions but before restoring > > >> them. > > >> >>>>>> That > > >> >>>>>> > >> always would have been a weakness of these test suites, > > >> regardless of > > >> >>>>>> > any > > >> >>>>>> > >> recent changes. > > >> >>>>>> > >> > > >> >>>>>> > >> Chris Nauroth > > >> >>>>>> > >> Hortonworks > > >> >>>>>> > >> http://hortonworks.com/ > > >> >>>>>> > >> > > >> >>>>>> > >> > > >> >>>>>> > >> > > >> >>>>>> > >> > > >> >>>>>> > >> > > >> >>>>>> > >> > > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> > > >> wrote: > > >> >>>>>> > >> > > >> >>>>>> > >>>Hey Colin, > > >> >>>>>> > >>> > > >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's > > >> going on > > >> >>>>>> with > > >> >>>>>> > >>>these boxes. He took a look and concluded that some perms > > are > > >> being > > >> >>>>>> set > > >> >>>>>> > in > > >> >>>>>> > >>>those directories by our unit tests which are precluding > > >> those files > > >> >>>>>> > from > > >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for us, > > but > > >> we > > >> >>>>>> should > > >> >>>>>> > >>>expect this to keep happening until we can fix the test in > > >> question > > >> >>>>>> to > > >> >>>>>> > >>>properly clean up after itself. > > >> >>>>>> > >>> > > >> >>>>>> > >>>To help narrow down which commit it was that started this, > > >> Andrew > > >> >>>>>> sent > > >> >>>>>> > me > > >> >>>>>> > >>>this info: > > >> >>>>>> > >>> > > >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- > > >> >>>>>> > > > >> >>>>>> > > >> > > >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ > > >> >>>>>> > has > > >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that > way > > >> since > > >> >>>>>> 9:32 > > >> >>>>>> > >>>UTC > > >> >>>>>> > >>>on March 5th." > > >> >>>>>> > >>> > > >> >>>>>> > >>>-- > > >> >>>>>> > >>>Aaron T. Myers > > >> >>>>>> > >>>Software Engineer, Cloudera > > >> >>>>>> > >>> > > >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe < > > >> cmcc...@apache.org > > >> >>>>>> > > > >> >>>>>> > >>>wrote: > > >> >>>>>> > >>> > > >> >>>>>> > >>>> Hi all, > > >> >>>>>> > >>>> > > >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I > can't > > >> find any > > >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours. > Most > > >> of them > > >> >>>>>> seem > > >> >>>>>> > >>>> to be failing with some variant of this message: > > >> >>>>>> > >>>> > > >> >>>>>> > >>>> [ERROR] Failed to execute goal > > >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean > > >> >>>>>> (default-clean) > > >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed > to > > >> delete > > >> >>>>>> > >>>> > > >> >>>>>> > >>>> > > >> >>>>>> > > > >> >>>>>> > > > >> >>>>>> > > >> > > > >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr > > >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 > > >> >>>>>> > >>>> -> [Help 1] > > >> >>>>>> > >>>> > > >> >>>>>> > >>>> Any ideas how this happened? Bad disk, unit test > setting > > >> wrong > > >> >>>>>> > >>>> permissions? > > >> >>>>>> > >>>> > > >> >>>>>> > >>>> Colin > > >> >>>>>> > >>>> > > >> >>>>>> > >> > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > > > >> >>>>>> > > -- > > >> >>>>>> > > Lei (Eddy) Xu > > >> >>>>>> > > Software Engineer, Cloudera > > >> >>>>>> > > > >> >>>>>> > > >> >>>>> > > >> >>>>> > > >> >> > > >> >> > > >> >> > > >> >> -- > > >> >> Lei (Eddy) Xu > > >> >> Software Engineer, Cloudera > > >> > > > > > > > > > > > > -- > > > Sean > > > > > > > > > > > -- > > Sean > > > -- Sean