I think the problem started from here.

https://builds.apache.org/job/PreCommit-HDFS-Build/9828/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestDataNodeVolumeFailure/testUnderReplicationAfterVolFailure/

As Chris mentioned TestDataNodeVolumeFailure is changing the permission.
But in this patch, ReplicationMonitor got NPE and it got terminate signal,
due to which MiniDFSCluster.shutdown() throwing Exception.

But, TestDataNodeVolumeFailure#teardown() is restoring those permission
after shutting down cluster. So in this case IMO, permissions were never
restored.


  @After
  public void tearDown() throws Exception {
    if(data_fail != null) {
      FileUtil.setWritable(data_fail, true);
    }
    if(failedDir != null) {
      FileUtil.setWritable(failedDir, true);
    }
    if(cluster != null) {
      cluster.shutdown();
    }
    for (int i = 0; i < 3; i++) {
      FileUtil.setExecutable(new File(dataDir, "data"+(2*i+1)), true);
      FileUtil.setExecutable(new File(dataDir, "data"+(2*i+2)), true);
    }
  }


Regards,
Vinay

On Thu, Mar 12, 2015 at 12:35 PM, Vinayakumar B <vinayakum...@apache.org>
wrote:

> When I see the history of these kind of builds, All these are failed on
> node H9.
>
> I think some or the other uncommitted patch would have created the problem
> and left it there.
>
>
> Regards,
> Vinay
>
> On Thu, Mar 12, 2015 at 6:16 AM, Sean Busbey <bus...@cloudera.com> wrote:
>
>> You could rely on a destructive git clean call instead of maven to do the
>> directory removal.
>>
>> --
>> Sean
>> On Mar 11, 2015 4:11 PM, "Colin McCabe" <cmcc...@alumni.cmu.edu> wrote:
>>
>> > Is there a maven plugin or setting we can use to simply remove
>> > directories that have no executable permissions on them?  Clearly we
>> > have the permission to do this from a technical point of view (since
>> > we created the directories as the jenkins user), it's simply that the
>> > code refuses to do it.
>> >
>> > Otherwise I guess we can just fix those tests...
>> >
>> > Colin
>> >
>> > On Tue, Mar 10, 2015 at 2:43 PM, Lei Xu <l...@cloudera.com> wrote:
>> > > Thanks a lot for looking into HDFS-7722, Chris.
>> > >
>> > > In HDFS-7722:
>> > > TestDataNodeVolumeFailureXXX tests reset data dir permissions in
>> > TearDown().
>> > > TestDataNodeHotSwapVolumes reset permissions in a finally clause.
>> > >
>> > > Also I ran mvn test several times on my machine and all tests passed.
>> > >
>> > > However, since in DiskChecker#checkDirAccess():
>> > >
>> > > private static void checkDirAccess(File dir) throws
>> DiskErrorException {
>> > >   if (!dir.isDirectory()) {
>> > >     throw new DiskErrorException("Not a directory: "
>> > >                                  + dir.toString());
>> > >   }
>> > >
>> > >   checkAccessByFileMethods(dir);
>> > > }
>> > >
>> > > One potentially safer alternative is replacing data dir with a regular
>> > > file to stimulate disk failures.
>> > >
>> > > On Tue, Mar 10, 2015 at 2:19 PM, Chris Nauroth <
>> cnaur...@hortonworks.com>
>> > wrote:
>> > >> TestDataNodeHotSwapVolumes, TestDataNodeVolumeFailure,
>> > >> TestDataNodeVolumeFailureReporting, and
>> > >> TestDataNodeVolumeFailureToleration all remove executable permissions
>> > from
>> > >> directories like the one Colin mentioned to simulate disk failures at
>> > data
>> > >> nodes.  I reviewed the code for all of those, and they all appear to
>> be
>> > >> doing the necessary work to restore executable permissions at the
>> end of
>> > >> the test.  The only recent uncommitted patch I¹ve seen that makes
>> > changes
>> > >> in these test suites is HDFS-7722.  That patch still looks fine
>> > though.  I
>> > >> don¹t know if there are other uncommitted patches that changed these
>> > test
>> > >> suites.
>> > >>
>> > >> I suppose it¹s also possible that the JUnit process unexpectedly died
>> > >> after removing executable permissions but before restoring them.
>> That
>> > >> always would have been a weakness of these test suites, regardless of
>> > any
>> > >> recent changes.
>> > >>
>> > >> Chris Nauroth
>> > >> Hortonworks
>> > >> http://hortonworks.com/
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >>
>> > >> On 3/10/15, 1:47 PM, "Aaron T. Myers" <a...@cloudera.com> wrote:
>> > >>
>> > >>>Hey Colin,
>> > >>>
>> > >>>I asked Andrew Bayer, who works with Apache Infra, what's going on
>> with
>> > >>>these boxes. He took a look and concluded that some perms are being
>> set
>> > in
>> > >>>those directories by our unit tests which are precluding those files
>> > from
>> > >>>getting deleted. He's going to clean up the boxes for us, but we
>> should
>> > >>>expect this to keep happening until we can fix the test in question
>> to
>> > >>>properly clean up after itself.
>> > >>>
>> > >>>To help narrow down which commit it was that started this, Andrew
>> sent
>> > me
>> > >>>this info:
>> > >>>
>> > >>>"/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-
>> >
>> >>>Build/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data3/
>> > has
>> > >>>500 perms, so I'm guessing that's the problem. Been that way since
>> 9:32
>> > >>>UTC
>> > >>>on March 5th."
>> > >>>
>> > >>>--
>> > >>>Aaron T. Myers
>> > >>>Software Engineer, Cloudera
>> > >>>
>> > >>>On Tue, Mar 10, 2015 at 1:24 PM, Colin P. McCabe <cmcc...@apache.org
>> >
>> > >>>wrote:
>> > >>>
>> > >>>> Hi all,
>> > >>>>
>> > >>>> A very quick (and not thorough) survey shows that I can't find any
>> > >>>> jenkins jobs that succeeded from the last 24 hours.  Most of them
>> seem
>> > >>>> to be failing with some variant of this message:
>> > >>>>
>> > >>>> [ERROR] Failed to execute goal
>> > >>>> org.apache.maven.plugins:maven-clean-plugin:2.5:clean
>> (default-clean)
>> > >>>> on project hadoop-hdfs: Failed to clean project: Failed to delete
>> > >>>>
>> > >>>>
>> >
>> >
>> >>>>/home/jenkins/jenkins-slave/workspace/PreCommit-HDFS-Build/hadoop-hdfs-pr
>> > >>>>oject/hadoop-hdfs/target/test/data/dfs/data/data3
>> > >>>> -> [Help 1]
>> > >>>>
>> > >>>> Any ideas how this happened?  Bad disk, unit test setting wrong
>> > >>>> permissions?
>> > >>>>
>> > >>>> Colin
>> > >>>>
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Lei (Eddy) Xu
>> > > Software Engineer, Cloudera
>> >
>>
>
>

Reply via email to