Thanks John. I tried it again and the error didn't occur. So who knows. Now that I've got a full run through i'll try to update the wiki with what I needed.
I'm currently doing a prototype, but if we move forward I'll look more into 2365. This current method is, as you point out, not great :-) ----- Original Message ----- From: John Sichi <jsi...@fb.com> To: "<user@hive.apache.org>" <user@hive.apache.org>; Ben West <bwsithspaw...@yahoo.com> Cc: Sent: Thursday, November 17, 2011 3:56 PM Subject: Re: Hive HBase wiki It has been quite a while since those instructions were written, so maybe something has broken. There is a unit test for it (hbase-handler/src/test/queries/hbase_bulk.m) which is still passing. If you're running via CLI, logs by default go in /tmp/<username> Long-term, energy best expended on this would go here: https://issues.apache.org/jira/browse/HIVE-2365 JVS On Nov 17, 2011, at 10:59 AM, Ben West wrote: > Hey all, > > I'm having some trouble with the HBase bulk load, following the instructions > from https://cwiki.apache.org/confluence/display/Hive/HBaseBulkLoad. In the > last step ("Sort Data") I get: > > java.lang.RuntimeException: Hive Runtime Error while closing operators: > java.io.IOException: No files found in > hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2 > at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:311) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:479) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) > at org.apache.hadoop.mapred.Child$4.run(Child.java:270) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1127) > at org.apache.hadoop.mapred.Child.main(Child.java:264) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > java.io.IOException: No files found in > hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2 > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:171) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:642) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557) > at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566) > at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303) > ... 7 more > Caused by: java.io.IOException: No files found in > hdfs://localhost/tmp/hive-cloudera/hive_2011-11-17_10-30-11_023_3494196694520237582/_tmp.-ext-10000/_tmp.000001_2 > at > org.apache.hadoop.hive.hbase.HiveHFileOutputFormat$2.close(HiveHFileOutputFormat.java:144) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:168) > ... 11 more > > When I look at the source of HiveHFileOutputFormat.java it has: > > // Move the region file(s) from the task output directory > // to the location specified by the user. There should > // actually only be one (each reducer produces one HFile), > // but we don't know what its name is. > FileSystem fs = outputdir.getFileSystem(jc); > fs.mkdirs(columnFamilyPath); > Path srcDir = outputdir; > for (;;) { > FileStatus [] files = fs.listStatus(srcDir); > if ((files == null) || (files.length == 0)) { > throw new IOException("No files found in " + srcDir); > } > > So I am getting the issue where the "task output directory" is empty. I > assume this is because the earlier task failed, but I'm not sure how to check > this. > > Does anyone know what is going on or how I can find the error log of whatever > was supposed to populate this directory? > > Thanks! > -Ben