This problems seems to be gone atleast for now. Have made a temp ( as of now) commit to restore the execute permissions for hadoop-hdfs/target/test/data directory.
Problem was often seen on H9 node. But now multiple builds executed on this node. Regards, Vinay On Tue, Mar 17, 2015 at 9:53 PM, Vinayakumar B <vinayakum...@apache.org> wrote: > Yes, Just create some directory with some contents in it within target > directory. And set permission to 600. > Then can run either 'mvn clean' or 'git clean' > > -Vinay > > On Tue, Mar 17, 2015 at 9:13 PM, Sean Busbey <bus...@cloudera.com> wrote: > >> Is the simulation just removing the executable bit on the directory? I'd >> like to get something I can reproduce locally. >> >> On Tue, Mar 17, 2015 at 2:29 AM, Vinayakumar B <vinayakum...@apache.org> >> wrote: >> >> > I have simulated the problem in my env and verified that, both 'git >> clean >> > -xdf' and 'mvn clean' will not remove the directory. >> > mvn fails where as git simply ignores (not even display any warning) the >> > problem. >> > >> > >> > >> > Regards, >> > Vinay >> > >> > On Tue, Mar 17, 2015 at 2:32 AM, Sean Busbey <bus...@cloudera.com> >> wrote: >> > >> > > Can someone point me to an example build that is broken? >> > > >> > > On Mon, Mar 16, 2015 at 3:52 PM, Sean Busbey <bus...@cloudera.com> >> > wrote: >> > > >> > > > I'm on it. HADOOP-11721 >> > > > >> > > > On Mon, Mar 16, 2015 at 3:44 PM, Haohui Mai <whe...@apache.org> >> wrote: >> > > > >> > > >> +1 for git clean. >> > > >> >> > > >> Colin, can you please get it in ASAP? Currently due to the jenkins >> > > >> issues, we cannot close the 2.7 blockers. >> > > >> >> > > >> Thanks, >> > > >> Haohui >> > > >> >> > > >> >> > > >> >> > > >> On Mon, Mar 16, 2015 at 11:54 AM, Colin P. McCabe < >> cmcc...@apache.org >> > > >> > > >> wrote: >> > > >> > If all it takes is someone creating a test that makes a directory >> > > >> > without -x, this is going to happen over and over. >> > > >> > >> > > >> > Let's just fix the problem at the root by running "git clean >> -fqdx" >> > in >> > > >> > our jenkins scripts. If there's no objections I will add this in >> > and >> > > >> > un-break the builds. >> > > >> > >> > > >> > best, >> > > >> > Colin >> > > >> > >> > > >> > On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <l...@cloudera.com> >> wrote: >> > > >> >> I filed HDFS-7917 to change the way to simulate disk failures. >> > > >> >> >> > > >> >> But I think we still need infrastructure folks to help with >> jenkins >> > > >> >> scripts to clean the dirs left today. >> > > >> >> >> > > >> >> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricet...@gmail.com >> > >> > > >> wrote: >> > > >> >>> Any updates on this issues? It seems that all HDFS jenkins >> builds >> > > are >> > > >> >>> still failing. >> > > >> >>> >> > > >> >>> Regards, >> > > >> >>> Haohui >> > > >> >>> >> > > >> >>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B < >> > > >> vinayakum...@apache.org> wrote: >> > > >> >>>> I think the problem started from here. >> > > >> >>>> >> > > >> >>>> >> > > >> >> > > >> > >> https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/ >> > > >> >>>> >> > > >> >>>> As Chris mentioned TestDataNodeVolumeFailure is changing the >> > > >> permission. >> > > >> >>>> But in this patch, ReplicationMonitor got NPE and it got >> > terminate >> > > >> signal, >> > > >> >>>> due to which MiniDFSCluster.shutdown() throwing Exception. >> > > >> >>>> >> > > >> >>>> But, TestDataNodeVolumeFailure#teardown() is restoring those >> > > >> permission >> > > >> >>>> after shutting down cluster. So in this case IMO, permissions >> > were >> > > >> never >> > > >> >>>> restored. >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> @After >> > > >> >>>> public void tearDown() throws Exception { >> > > >> >>>> if(data_fail != null) { >> > > >> >>>> FileUtil.setWritable(data_fail, true); >> > > >> >>>> } >> > > >> >>>> if(failedDir != null) { >> > > >> >>>> FileUtil.setWritable(failedDir, true); >> > > >> >>>> } >> > > >> >>>> if(cluster != null) { >> > > >> >>>> cluster.shutdown(); >> > > >> >>>> } >> > > >> >>>> for (int i = 0; i < 3; i++) { >> > > >> >>>> FileUtil.setExecutable(new File(dataDir, >> "data"+(2*i+1)), >> > > >> true); >> > > >> >>>> FileUtil.setExecutable(new File(dataDir, >> "data"+(2*i+2)), >> > > >> true); >> > > >> >>>> } >> > > >> >>>> } >> > > >> >>>> >> > > >> >>>> >> > > >> >>>> Regards, >> > > >> >>>> Vinay >> > > >> >>>> >> > > >> >>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B < >> > > >> vinayakum...@apache.org> >> > > >> >>>> wrote: >> > > >> >>>> >> > > >> >>>>> When I see the history of these kind of builds, All these are >> > > >> failed on >> > > >> >>>>> node H9. >> > > >> >>>>> >> > > >> >>>>> I think some or the other uncommitted patch would have >> created >> > the >> > > >> problem >> > > >> >>>>> and left it there. >> > > >> >>>>> >> > > >> >>>>> >> > > >> >>>>> Regards, >> > > >> >>>>> Vinay >> > > >> >>>>> >> > > >> >>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey < >> > bus...@cloudera.com >> > > > >> > > >> wrote: >> > > >> >>>>> >> > > >> >>>>>> You could rely on a destructive git clean call instead of >> maven >> > > to >> > > >> do the >> > > >> >>>>>> directory removal. >> > > >> >>>>>> >> > > >> >>>>>> -- >> > > >> >>>>>> Sean >> > > >> >>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" < >> > cmcc...@alumni.cmu.edu> >> > > >> wrote: >> > > >> >>>>>> >> > > >> >>>>>> > Is there a maven plugin or setting we can use to simply >> > remove >> > > >> >>>>>> > directories that have no executable permissions on them? >> > > >> Clearly we >> > > >> >>>>>> > have the permission to do this from a technical point of >> view >> > > >> (since >> > > >> >>>>>> > we created the directories as the jenkins user), it's >> simply >> > > >> that the >> > > >> >>>>>> > code refuses to do it. >> > > >> >>>>>> > >> > > >> >>>>>> > Otherwise I guess we can just fix those tests... >> > > >> >>>>>> > >> > > >> >>>>>> > Colin >> > > >> >>>>>> > >> > > >> >>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com >> > >> > > >> wrote: >> > > >> >>>>>> > > Thanks a lot for looking into HDFS-7722, Chris. >> > > >> >>>>>> > > >> > > >> >>>>>> > > In HDFS-7722: >> > > >> >>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir >> > permissions >> > > >> in >> > > >> >>>>>> > TearDown(). >> > > >> >>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a >> finally >> > > >> clause. >> > > >> >>>>>> > > >> > > >> >>>>>> > > Also I ran mvn test several times on my machine and all >> > tests >> > > >> passed. >> > > >> >>>>>> > > >> > > >> >>>>>> > > However, since in DiskChecker#checkDirAccess(): >> > > >> >>>>>> > > >> > > >> >>>>>> > > private static void checkDirAccess(File dir) throws >> > > >> >>>>>> DiskErrorException { >> > > >> >>>>>> > > if (!dir.isDirectory()) { >> > > >> >>>>>> > > throw new DiskErrorException("Not a directory: " >> > > >> >>>>>> > > + dir.toString()); >> > > >> >>>>>> > > } >> > > >> >>>>>> > > >> > > >> >>>>>> > > checkAccessByFileMethods(dir); >> > > >> >>>>>> > > } >> > > >> >>>>>> > > >> > > >> >>>>>> > > One potentially safer alternative is replacing data dir >> > with >> > > a >> > > >> regular >> > > >> >>>>>> > > file to stimulate disk failures. >> > > >> >>>>>> > > >> > > >> >>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth < >> > > >> >>>>>> cnaur...@hortonworks.com> >> > > >> >>>>>> > wrote: >> > > >> >>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure, >> > > >> >>>>>> > >> TestDataNodeVolumeFailureReporting, and >> > > >> >>>>>> > >> TestDataNodeVolumeFailureToleration all remove >> executable >> > > >> permissions >> > > >> >>>>>> > from >> > > >> >>>>>> > >> directories like the one Colin mentioned to simulate >> disk >> > > >> failures at >> > > >> >>>>>> > data >> > > >> >>>>>> > >> nodes. I reviewed the code for all of those, and they >> all >> > > >> appear to >> > > >> >>>>>> be >> > > >> >>>>>> > >> doing the necessary work to restore executable >> permissions >> > > at >> > > >> the >> > > >> >>>>>> end of >> > > >> >>>>>> > >> the test. The only recent uncommitted patch I¹ve seen >> > that >> > > >> makes >> > > >> >>>>>> > changes >> > > >> >>>>>> > >> in these test suites is HDFS-7722. That patch still >> looks >> > > >> fine >> > > >> >>>>>> > though. I >> > > >> >>>>>> > >> don¹t know if there are other uncommitted patches that >> > > >> changed these >> > > >> >>>>>> > test >> > > >> >>>>>> > >> suites. >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> I suppose it¹s also possible that the JUnit process >> > > >> unexpectedly died >> > > >> >>>>>> > >> after removing executable permissions but before >> restoring >> > > >> them. >> > > >> >>>>>> That >> > > >> >>>>>> > >> always would have been a weakness of these test suites, >> > > >> regardless of >> > > >> >>>>>> > any >> > > >> >>>>>> > >> recent changes. >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> Chris Nauroth >> > > >> >>>>>> > >> Hortonworks >> > > >> >>>>>> > >> http://hortonworks.com/ >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> >> > > >> >>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" < >> a...@cloudera.com> >> > > >> wrote: >> > > >> >>>>>> > >> >> > > >> >>>>>> > >>>Hey Colin, >> > > >> >>>>>> > >>> >> > > >> >>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, >> what's >> > > >> going on >> > > >> >>>>>> with >> > > >> >>>>>> > >>>these boxes. He took a look and concluded that some >> perms >> > > are >> > > >> being >> > > >> >>>>>> set >> > > >> >>>>>> > in >> > > >> >>>>>> > >>>those directories by our unit tests which are >> precluding >> > > >> those files >> > > >> >>>>>> > from >> > > >> >>>>>> > >>>getting deleted. He's going to clean up the boxes for >> us, >> > > but >> > > >> we >> > > >> >>>>>> should >> > > >> >>>>>> > >>>expect this to keep happening until we can fix the >> test in >> > > >> question >> > > >> >>>>>> to >> > > >> >>>>>> > >>>properly clean up after itself. >> > > >> >>>>>> > >>> >> > > >> >>>>>> > >>>To help narrow down which commit it was that started >> this, >> > > >> Andrew >> > > >> >>>>>> sent >> > > >> >>>>>> > me >> > > >> >>>>>> > >>>this info: >> > > >> >>>>>> > >>> >> > > >> >>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS- >> > > >> >>>>>> > >> > > >> >>>>>> >> > > >> >> > > >> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/ >> > > >> >>>>>> > has >> > > >> >>>>>> > >>>500 perms, so I'm guessing that's the problem. Been >> that >> > way >> > > >> since >> > > >> >>>>>> 9:32 >> > > >> >>>>>> > >>>UTC >> > > >> >>>>>> > >>>on March 5th." >> > > >> >>>>>> > >>> >> > > >> >>>>>> > >>>-- >> > > >> >>>>>> > >>>Aaron T. Myers >> > > >> >>>>>> > >>>Software Engineer, Cloudera >> > > >> >>>>>> > >>> >> > > >> >>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe < >> > > >> cmcc...@apache.org >> > > >> >>>>>> > >> > > >> >>>>>> > >>>wrote: >> > > >> >>>>>> > >>> >> > > >> >>>>>> > >>>> Hi all, >> > > >> >>>>>> > >>>> >> > > >> >>>>>> > >>>> A very quick (and not thorough) survey shows that I >> > can't >> > > >> find any >> > > >> >>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours. >> > Most >> > > >> of them >> > > >> >>>>>> seem >> > > >> >>>>>> > >>>> to be failing with some variant of this message: >> > > >> >>>>>> > >>>> >> > > >> >>>>>> > >>>> [ERROR] Failed to execute goal >> > > >> >>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean >> > > >> >>>>>> (default-clean) >> > > >> >>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: >> Failed >> > to >> > > >> delete >> > > >> >>>>>> > >>>> >> > > >> >>>>>> > >>>> >> > > >> >>>>>> > >> > > >> >>>>>> > >> > > >> >>>>>> >> > > >> >> > > >> > >> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr >> > > >> >>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3 >> > > >> >>>>>> > >>>> -> [Help 1] >> > > >> >>>>>> > >>>> >> > > >> >>>>>> > >>>> Any ideas how this happened? Bad disk, unit test >> > setting >> > > >> wrong >> > > >> >>>>>> > >>>> permissions? >> > > >> >>>>>> > >>>> >> > > >> >>>>>> > >>>> Colin >> > > >> >>>>>> > >>>> >> > > >> >>>>>> > >> >> > > >> >>>>>> > > >> > > >> >>>>>> > > >> > > >> >>>>>> > > >> > > >> >>>>>> > > -- >> > > >> >>>>>> > > Lei (Eddy) Xu >> > > >> >>>>>> > > Software Engineer, Cloudera >> > > >> >>>>>> > >> > > >> >>>>>> >> > > >> >>>>> >> > > >> >>>>> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> -- >> > > >> >> Lei (Eddy) Xu >> > > >> >> Software Engineer, Cloudera >> > > >> >> > > > >> > > > >> > > > >> > > > -- >> > > > Sean >> > > > >> > > >> > > >> > > >> > > -- >> > > Sean >> > > >> > >> >> >> >> -- >> Sean >> > >