I am already using umask 022. Permissions on all components of the path are also OK. Also, "ls -ld" succeeds sometimes, but other times it fails with a SIGPIPE and no error message. Additionally, I saw cases where it SIGPIPE'd but produced correct output (a "drwxr-xr-x ..." line). Here is my patch for Hadoop to work around the ls -ld SIGPIPE issue (I just overrode the hadoop-0.20.205.0 jar in my local maven repository to run unit tests).
Index: src/core/org/apache/hadoop/fs/RawLocalFileSystem.java =================================================================== --- src/core/org/apache/hadoop/fs/RawLocalFileSystem.java (revision 1198126) +++ src/core/org/apache/hadoop/fs/RawLocalFileSystem.java (working copy) @@ -416,7 +416,7 @@ IOException e = null; try { StringTokenizer t = new StringTokenizer( - FileUtil.execCommand(new File(getPath().toUri()), + FileUtil.execCommandWithRetries(new File(getPath().toUri()), Shell.getGET_PERMISSION_COMMAND())); //expected format //-rw------- 1 username groupname ... Index: src/core/org/apache/hadoop/fs/FileUtil.java =================================================================== --- src/core/org/apache/hadoop/fs/FileUtil.java (revision 1198126) +++ src/core/org/apache/hadoop/fs/FileUtil.java (working copy) @@ -19,6 +19,7 @@ package org.apache.hadoop.fs; import java.io.*; +import java.util.Arrays; import java.util.Enumeration; import java.util.zip.ZipEntry; import java.util.zip.ZipFile; @@ -703,6 +704,20 @@ String output = Shell.execCommand(args); return output; } + + static String execCommandWithRetries(File f, String... cmd) + throws IOException { + for (int attempt = 0; attempt < 10; ++attempt) { + try { + return execCommand(f, cmd); + } catch (IOException ex) { + LOG.error("Failed to execute command: f=" + f + " cmd=" + + Arrays.toString(cmd) + " (attempt " + attempt + ")", + ex); + } + } + return execCommand(f, cmd); + } /** * Create a tmp file for a base file. Index: src/core/org/apache/hadoop/util/Shell.java =================================================================== --- src/core/org/apache/hadoop/util/Shell.java (revision 1198126) +++ src/core/org/apache/hadoop/util/Shell.java (working copy) @@ -239,6 +239,7 @@ String line = inReader.readLine(); while(line != null) { line = inReader.readLine(); + LOG.error("Additional line from output: " + line); } // wait for the process to finish and check the exit code exitCode = process.waitFor(); @@ -251,6 +252,25 @@ completed.set(true); //the timeout thread handling //taken care in finally block + LOG.error("exitCode=" + exitCode); + if (exitCode == 141 && this instanceof ShellCommandExecutor) { + String[] execStr = getExecString(); + String outStr = ((ShellCommandExecutor) this).getOutput(); + LOG.error("execStr=" + java.util.Arrays.toString(execStr) + + ", outStr=" + outStr); + if (execStr.length >= 2 && + execStr[0].equals("/bin/ls") && + execStr[1].equals("-ld") && + outStr.startsWith("d") && + outStr.length() >= 11 && + outStr.charAt(10) == ' ') { + // A work-around for a weird SIGPIPE bug on ls -ld. + LOG.error("Ignoring exit code " + exitCode + " for /bin/ls -ld: " + + "got output " + outStr); + exitCode = 0; + } + } + if (exitCode != 0) { throw new ExitCodeException(exitCode, errMsg.toString()); } Thanks, --Mikhail On Wed, Dec 7, 2011 at 11:31 AM, Ted Yu <yuzhih...@gmail.com> wrote: > A tip from Jonathan Hsieh is related to the problem Mikhail was > experiencing: > -------- > Run: > umask 022 > > before running the test on whatever machine you are testing on. > > -------- > > Cheers > > On Wed, Dec 7, 2011 at 10:29 AM, Ted Yu <yuzhih...@gmail.com> wrote: > > > Mikhail: > > Your patch was stripped by email server. > > > > I assume you have verified permission for all components of path: > > > > > /data/users/mbautin/workdirs/hb-os/target/test-data/37d6e996-cba6-4a12-85bc-dbcf2e91d297 > > > > Cheers > > > > > > On Tue, Dec 6, 2011 at 5:07 PM, Mikhail Bautin < > > bautin.mailing.li...@gmail.com> wrote: > > > >> Hello, > >> > >> I've been running into the following issue when running HBase tests. A > >> lot of them would fail with an exception similar to the one shown > below (I > >> added more information to the exception messages). Exit code 141 seems > to > >> correspond to SIGPIPE, but I did not find anything obvious in > Shell.java in > >> Hadoop. In Shell.runCommand we read the error stream on a separate > thread > >> and the output stream on the main thread, and consume both streams > >> completely before waiting for the external process to terminate. I > ended up > >> doing a hacky work-around (patch attached) to be able to run HBase > tests, > >> but any insight about what could be causing this issue is appreciated. > >> HBase trunk uses hadoop-0.20.205.0 by default. > >> > >> Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 2.544 > sec > >> <<< FAILURE! > >> > org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort > >> Time elapsed: 0 sec <<< ERROR! > >> java.lang.RuntimeException: Error while running command to get file > >> permissions : org.apache.hadoop.util.Shell$ExitCodeException: Command: > >> [/bin/ls, -ld, > >> > /data/users/mbautin/workdirs/hb-os/target/test-data/37d6e996-cba6-4a12-85bc-dbcf2e91d297/dfscluster_76df8fc0-6827-4d9d-8728-eb5ee43b0bae/dfs/data/data3], > >> message: > >> , exitCode: 141 > >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:283) > >> at org.apache.hadoop.util.Shell.run(Shell.java:183) > >> at > >> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:405) > >> at org.apache.hadoop.util.Shell.execCommand(Shell.java:491) > >> at org.apache.hadoop.util.Shell.execCommand(Shell.java:474) > >> at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:703) > >> at > >> > org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:418) > >> at > >> > org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getPermission(RawLocalFileSystem.java:393) > >> at > >> > org.apache.hadoop.util.DiskChecker.mkdirsWithExistsAndPermissionCheck(DiskChecker.java:146) > >> at > >> org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:162) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1537) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1484) > >> at > >> > org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1459) > >> at > >> > org.apache.hadoop.hdfs.MiniDFSCluster.startDataNodes(MiniDFSCluster.java:417) > >> at > >> org.apache.hadoop.hdfs.MiniDFSCluster.<init>(MiniDFSCluster.java:280) > >> at > >> > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniDFSCluster(HBaseTestingUtility.java:369) > >> at > >> > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:537) > >> at > >> > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:493) > >> at > >> > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:480) > >> at > >> > org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort.setupBeforeClass(TestRegionServerCoprocessorExceptionWithAbort.java:94) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> at > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> at java.lang.reflect.Method.invoke(Method.java:597) > >> at > >> > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45) > >> at > >> > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) > >> at > >> > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42) > >> at > >> > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:27) > >> at > >> > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:30) > >> at org.junit.runners.ParentRunner.run(ParentRunner.java:300) > >> at > >> > org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:53) > >> at > >> > org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:123) > >> at > >> > org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:104) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> at > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> at java.lang.reflect.Method.invoke(Method.java:597) > >> at > >> > org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164) > >> at > >> > org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110) > >> at > >> > org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175) > >> at > >> > org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:81) > >> at > >> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68) > >> > >> Thanks, > >> --Mikhail > >> > > > > >