Hi,
Seeking advice from experts.

Thanks

---------- Forwarded message ----------
From: Mikhail Bautin <bautin.mailing.li...@gmail.com>
Date: Wed, Dec 7, 2011 at 11:52 AM
Subject: Re: strange SIGPIPE from "bin/ls -ld" in Shell.runCommand seen
when running HBase tests
To: d...@hbase.apache.org, common-...@hadoop.apache.org


I am already using umask 022.  Permissions on all components of the path
are also OK.  Also, "ls -ld" succeeds sometimes, but other times it fails
with a SIGPIPE and no error message. Additionally, I saw cases where it
SIGPIPE'd but produced correct output (a "drwxr-xr-x ..." line).  Here is
my patch for Hadoop to work around the ls -ld SIGPIPE issue (I just
overrode the hadoop-0.20.205.0 jar in my local maven repository to run unit
tests).

Index: src/core/org/apache/hadoop/fs/RawLocalFileSystem.java
===================================================================
--- src/core/org/apache/hadoop/fs/RawLocalFileSystem.java (revision 1198126)
+++ src/core/org/apache/hadoop/fs/RawLocalFileSystem.java (working copy)
@@ -416,7 +416,7 @@
      IOException e = null;
      try {
        StringTokenizer t = new StringTokenizer(
-            FileUtil.execCommand(new File(getPath().toUri()),
+            FileUtil.execCommandWithRetries(new File(getPath().toUri()),
                                 Shell.getGET_PERMISSION_COMMAND()));
        //expected format
        //-rw-------    1 username groupname ...
Index: src/core/org/apache/hadoop/fs/FileUtil.java
===================================================================
--- src/core/org/apache/hadoop/fs/FileUtil.java (revision 1198126)
+++ src/core/org/apache/hadoop/fs/FileUtil.java (working copy)
@@ -19,6 +19,7 @@
 package org.apache.hadoop.fs;

 import java.io.*;
+import java.util.Arrays;
 import java.util.Enumeration;
 import java.util.zip.ZipEntry;
 import java.util.zip.ZipFile;
@@ -703,6 +704,20 @@
    String output = Shell.execCommand(args);
    return output;
  }
+
+  static String execCommandWithRetries(File f, String... cmd)
+      throws IOException {
+    for (int attempt = 0; attempt < 10; ++attempt) {
+      try {
+        return execCommand(f, cmd);
+      } catch (IOException ex) {
+        LOG.error("Failed to execute command: f=" + f + " cmd=" +
+            Arrays.toString(cmd) + " (attempt " + attempt + ")",
+            ex);
+      }
+    }
+    return execCommand(f, cmd);
+  }

  /**
   * Create a tmp file for a base file.
Index: src/core/org/apache/hadoop/util/Shell.java
===================================================================
--- src/core/org/apache/hadoop/util/Shell.java (revision 1198126)
+++ src/core/org/apache/hadoop/util/Shell.java (working copy)
@@ -239,6 +239,7 @@
      String line = inReader.readLine();
      while(line != null) {
        line = inReader.readLine();
+        LOG.error("Additional line from output: " + line);
      }
      // wait for the process to finish and check the exit code
      exitCode  = process.waitFor();
@@ -251,6 +252,25 @@
      completed.set(true);
      //the timeout thread handling
      //taken care in finally block
+      LOG.error("exitCode=" + exitCode);
+      if (exitCode == 141 && this instanceof ShellCommandExecutor) {
+        String[] execStr = getExecString();
+        String outStr = ((ShellCommandExecutor) this).getOutput();
+        LOG.error("execStr=" + java.util.Arrays.toString(execStr) +
+            ", outStr=" + outStr);
+        if (execStr.length >= 2 &&
+            execStr[0].equals("/bin/ls") &&
+            execStr[1].equals("-ld") &&
+            outStr.startsWith("d") &&
+            outStr.length() >= 11 &&
+            outStr.charAt(10) == ' ') {
+          // A work-around for a weird SIGPIPE bug on ls -ld.
+          LOG.error("Ignoring exit code " + exitCode + " for /bin/ls -ld:
" +
+              "got output " + outStr);
+          exitCode = 0;
+        }
+      }
+
      if (exitCode != 0) {
        throw new ExitCodeException(exitCode, errMsg.toString());
      }

Thanks,
--Mikhail

On Wed, Dec 7, 2011 at 11:31 AM, Ted Yu <yuzhih...@gmail.com> wrote:

> A tip from Jonathan Hsieh is related to the problem Mikhail was
> experiencing:
> --------
> Run:
> umask 022
>
> before running the test on whatever machine you are testing on.
>
> --------
>
> Cheers
>
> On Wed, Dec 7, 2011 at 10:29 AM, Ted Yu <yuzhih...@gmail.com> wrote:
>
> > Mikhail:
> > Your patch was stripped by email server.
> >
> > I assume you have verified permission for all components of path:
> >
> >
>
/data/users/mbautin/workdirs/hb-os/target/test-data/37d6e996-cba6-4a12-85bc-dbcf2e91d297
> >
> > Cheers
> >
> >
> > On Tue, Dec 6, 2011 at 5:07 PM, Mikhail Bautin <
> > bautin.mailing.li...@gmail.com> wrote:
> >
> >> Hello,
> >>
> >> I've been running into the following issue when running HBase tests. A
> >> lot of them would fail with an exception similar to  the one shown
> below (I
> >> added more information to the exception messages). Exit code 141 seems
> to
> >> correspond to SIGPIPE, but I did not find anything obvious in
> Shell.java in
> >> Hadoop. In Shell.runCommand we read the error stream on a separate
> thread
> >> and the output stream on the main thread, and consume both streams
> >> completely before waiting for the external process to terminate. I
> ended up
> >> doing a hacky work-around (patch attached) to be able to run HBase
> tests,
> >> but any insight about what could be causing this issue is appreciated.
> >> HBase trunk uses hadoop-0.20.205.0 by default.
> >>
> >> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.544
> sec
> >> <<< FAILURE!
> >>
>
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
> >>  Time elapsed: 0 sec  <<< ERROR!
> >> java.lang.RuntimeException: Error while running command to get file
> >> permissions : org.apache.hadoop.util.Shell$ExitCodeException: Command:
> >> [/bin/ls, -ld,
> >>
>
/data/users/mbautin/workdirs/hb-os/target/test-data/37d6e996-cba6-4a12-85bc-dbcf2e91d297/dfscluster_76df8fc0-6827-4d9d-8728-eb5ee43b0bae/dfs/data/data3],
> >> message:
> >>  , exitCode: 141
> >>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:283)
> >>         at org.apache.hadoop.util.Shell.run(Shell.java:183)
> >>         at
> >>
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:405)
> >>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:491)
> >>         at org.apache.hadoop.util.Shell.execCommand(Shell.java:474)
> >>         at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:703)
> >>         at
> >>
>
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:418)
> >>         at
> >>
>
org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getPermission(RawLocalFileSystem.java:393)
> >>         at
> >>
>
org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:146)
> >>         at
> >> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:162)
> >>         at
> >>
>
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1537)
> >>         at
> >>
>
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1484)
> >>         at
> >>
>
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1459)
> >>         at
> >>
>
org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417)
> >>         at
> >> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:280)
> >>         at
> >>
>
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:369)
> >>         at
> >>
>
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:537)
> >>         at
> >>
>
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:493)
> >>         at
> >>
>
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:480)
> >>         at
> >>
>
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.setupBeforeClass(TestRegionServerCoprocessorExceptionWithAbort.java:94)
> >>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>         at
> >>
>
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>         at
> >>
>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>         at java.lang.reflect.Method.invoke(Method.java:597)
> >>         at
> >>
>
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
> >>         at
> >>
>
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
> >>         at
> >>
>
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
> >>         at
> >>
>
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27)
> >>         at
> >>
>
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30)
> >>         at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
> >>         at
> >>
>
org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53)
> >>         at
> >>
>
org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123)
> >>         at
> >>
>
org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104)
> >>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>         at
> >>
>
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >>         at
> >>
>
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >>         at java.lang.reflect.Method.invoke(Method.java:597)
> >>         at
> >>
>
org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
> >>         at
> >>
>
org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
> >>         at
> >>
>
org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
> >>         at
> >>
>
org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:81)
> >>         at
> >>
org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
> >>
> >> Thanks,
> >> --Mikhail
> >>
> >
> >
>

Reply via email to