+1 for the git clean command.

HDFS-7917 still might be valuable for enabling us to run a few unit tests
on Windows that are currently skipped.  Let's please keep it open, but
it's less urgent.

Thanks!

Chris Nauroth
Hortonworks
http://hortonworks.com/






On 3/16/15, 11:54 AM, "Colin P. McCabe" <cmcc...@apache.org> wrote:

>If all it takes is someone creating a test that makes a directory
>without -x, this is going to happen over and over.
>
>Let's just fix the problem at the root by running "git clean -fqdx" in
>our jenkins scripts.  If there's no objections I will add this in and
>un-break the builds.
>
>best,
>Colin
>
>On Fri, Mar 13, 2015 at 1:48 PM, Lei Xu <l...@cloudera.com> wrote:
>> I filed HDFS-7917 to change the way to simulate disk failures.
>>
>> But I think we still need infrastructure folks to help with jenkins
>> scripts to clean the dirs left today.
>>
>> On Fri, Mar 13, 2015 at 1:38 PM, Mai Haohui <ricet...@gmail.com> wrote:
>>> Any updates on this issues? It seems that all HDFS jenkins builds are
>>> still failing.
>>>
>>> Regards,
>>> Haohui
>>>
>>> On Thu, Mar 12, 2015 at 12:53 AM, Vinayakumar B
>>><vinayakum...@apache.org> wrote:
>>>> I think the problem started from here.
>>>>
>>>> 
>>>>https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/juni
>>>>t/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/test
>>>>UnderReplicationAfterVolFailure/
>>>>
>>>> As Chris mentioned TestDataNodeVolumeFailure is changing the
>>>>permission.
>>>> But in this patch, ReplicationMonitor got NPE and it got terminate
>>>>signal,
>>>> due to which MiniDFSCluster.shutdown() throwing Exception.
>>>>
>>>> But, TestDataNodeVolumeFailure#teardown() is restoring those
>>>>permission
>>>> after shutting down cluster. So in this case IMO, permissions were
>>>>never
>>>> restored.
>>>>
>>>>
>>>>   @After
>>>>   public void tearDown() throws Exception {
>>>>     if(data_fail != null) {
>>>>       FileUtil.setWritable(data_fail, true);
>>>>     }
>>>>     if(failedDir != null) {
>>>>       FileUtil.setWritable(failedDir, true);
>>>>     }
>>>>     if(cluster != null) {
>>>>       cluster.shutdown();
>>>>     }
>>>>     for (int i = 0; i < 3; i++) {
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
>>>>       FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
>>>>     }
>>>>   }
>>>>
>>>>
>>>> Regards,
>>>> Vinay
>>>>
>>>> On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B
>>>><vinayakum...@apache.org>
>>>> wrote:
>>>>
>>>>> When I see the history of these kind of builds, All these are failed
>>>>>on
>>>>> node H9.
>>>>>
>>>>> I think some or the other uncommitted patch would have created the
>>>>>problem
>>>>> and left it there.
>>>>>
>>>>>
>>>>> Regards,
>>>>> Vinay
>>>>>
>>>>> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bus...@cloudera.com>
>>>>>wrote:
>>>>>
>>>>>> You could rely on a destructive git clean call instead of maven to
>>>>>>do the
>>>>>> directory removal.
>>>>>>
>>>>>> --
>>>>>> Sean
>>>>>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu>
>>>>>>wrote:
>>>>>>
>>>>>> > Is there a maven plugin or setting we can use to simply remove
>>>>>> > directories that have no executable permissions on them?  Clearly
>>>>>>we
>>>>>> > have the permission to do this from a technical point of view
>>>>>>(since
>>>>>> > we created the directories as the jenkins user), it's simply that
>>>>>>the
>>>>>> > code refuses to do it.
>>>>>> >
>>>>>> > Otherwise I guess we can just fix those tests...
>>>>>> >
>>>>>> > Colin
>>>>>> >
>>>>>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote:
>>>>>> > > Thanks a lot for looking into HDFS-7722, Chris.
>>>>>> > >
>>>>>> > > In HDFS-7722:
>>>>>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>>>>>> > TearDown().
>>>>>> > > TestDataNodeHotSwapVolumes reset permissions in a finally
>>>>>>clause.
>>>>>> > >
>>>>>> > > Also I ran mvn test several times on my machine and all tests
>>>>>>passed.
>>>>>> > >
>>>>>> > > However, since in DiskChecker#checkDirAccess():
>>>>>> > >
>>>>>> > > private static void checkDirAccess(File dir) throws
>>>>>> DiskErrorException {
>>>>>> > >   if (!dir.isDirectory()) {
>>>>>> > >     throw new DiskErrorException("Not a directory: "
>>>>>> > >                                  + dir.toString());
>>>>>> > >   }
>>>>>> > >
>>>>>> > >   checkAccessByFileMethods(dir);
>>>>>> > > }
>>>>>> > >
>>>>>> > > One potentially safer alternative is replacing data dir with a
>>>>>>regular
>>>>>> > > file to stimulate disk failures.
>>>>>> > >
>>>>>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>>>>>> cnaur...@hortonworks.com>
>>>>>> > wrote:
>>>>>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>>>>>> > >> TestDataNodeVolumeFailureReporting, and
>>>>>> > >> TestDataNodeVolumeFailureToleration all remove executable
>>>>>>permissions
>>>>>> > from
>>>>>> > >> directories like the one Colin mentioned to simulate disk
>>>>>>failures at
>>>>>> > data
>>>>>> > >> nodes.  I reviewed the code for all of those, and they all
>>>>>>appear to
>>>>>> be
>>>>>> > >> doing the necessary work to restore executable permissions at
>>>>>>the
>>>>>> end of
>>>>>> > >> the test.  The only recent uncommitted patch I¹ve seen that
>>>>>>makes
>>>>>> > changes
>>>>>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>>>>>> > though.  I
>>>>>> > >> don¹t know if there are other uncommitted patches that changed
>>>>>>these
>>>>>> > test
>>>>>> > >> suites.
>>>>>> > >>
>>>>>> > >> I suppose it¹s also possible that the JUnit process
>>>>>>unexpectedly died
>>>>>> > >> after removing executable permissions but before restoring
>>>>>>them.
>>>>>> That
>>>>>> > >> always would have been a weakness of these test suites,
>>>>>>regardless of
>>>>>> > any
>>>>>> > >> recent changes.
>>>>>> > >>
>>>>>> > >> Chris Nauroth
>>>>>> > >> Hortonworks
>>>>>> > >> http://hortonworks.com/
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >>
>>>>>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote:
>>>>>> > >>
>>>>>> > >>>Hey Colin,
>>>>>> > >>>
>>>>>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's
>>>>>>going on
>>>>>> with
>>>>>> > >>>these boxes. He took a look and concluded that some perms are
>>>>>>being
>>>>>> set
>>>>>> > in
>>>>>> > >>>those directories by our unit tests which are precluding those
>>>>>>files
>>>>>> > from
>>>>>> > >>>getting deleted. He's going to clean up the boxes for us, but
>>>>>>we
>>>>>> should
>>>>>> > >>>expect this to keep happening until we can fix the test in
>>>>>>question
>>>>>> to
>>>>>> > >>>properly clean up after itself.
>>>>>> > >>>
>>>>>> > >>>To help narrow down which commit it was that started this,
>>>>>>Andrew
>>>>>> sent
>>>>>> > me
>>>>>> > >>>this info:
>>>>>> > >>>
>>>>>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>>>>>> >
>>>>>> 
>>>>>>>>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/da
>>>>>>>>>ta3/
>>>>>> > has
>>>>>> > >>>500 perms, so I'm guessing that's the problem. Been that way
>>>>>>since
>>>>>> 9:32
>>>>>> > >>>UTC
>>>>>> > >>>on March 5th."
>>>>>> > >>>
>>>>>> > >>>--
>>>>>> > >>>Aaron T. Myers
>>>>>> > >>>Software Engineer, Cloudera
>>>>>> > >>>
>>>>>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe
>>>>>><cmcc...@apache.org
>>>>>> >
>>>>>> > >>>wrote:
>>>>>> > >>>
>>>>>> > >>>> Hi all,
>>>>>> > >>>>
>>>>>> > >>>> A very quick (and not thorough) survey shows that I can't
>>>>>>find any
>>>>>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of
>>>>>>them
>>>>>> seem
>>>>>> > >>>> to be failing with some variant of this message:
>>>>>> > >>>>
>>>>>> > >>>> [ERROR] Failed to execute goal
>>>>>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>>>>>> (default-clean)
>>>>>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to
>>>>>>delete
>>>>>> > >>>>
>>>>>> > >>>>
>>>>>> >
>>>>>> >
>>>>>> 
>>>>>>>>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop
>>>>>>>>>>-hdfs-pr
>>>>>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>>>>>> > >>>> -> [Help 1]
>>>>>> > >>>>
>>>>>> > >>>> Any ideas how this happened?  Bad disk, unit test setting
>>>>>>wrong
>>>>>> > >>>> permissions?
>>>>>> > >>>>
>>>>>> > >>>> Colin
>>>>>> > >>>>
>>>>>> > >>
>>>>>> > >
>>>>>> > >
>>>>>> > >
>>>>>> > > --
>>>>>> > > Lei (Eddy) Xu
>>>>>> > > Software Engineer, Cloudera
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>
>>
>>
>> --
>> Lei (Eddy) Xu
>> Software Engineer, Cloudera

Reply via email to