Hi,

I am trying to solve the "last reducer hangs because of GC because of truckloads of data" issue that I have on some queries, by using SET hive.optimize.skewjoin=true; Unfortunately, every time I try this, I encounter an error of the form:
...
2012-11-30 10:42:39,181 Stage-10 map = 100%, reduce = 100%, Cumulative CPU 406984.1 sec MapReduce Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds 100 msec
Ended Job = job_201211281801_0463
java.io.FileNotFoundException: File hdfs://nameservice1/tmp/hive-dmorel/hive_2012-11-30_09-23-00_375_8178040921995939301/-mr-10014/hive_skew_join_bigkeys_0 does not exist. at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:365) at org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96) at org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
        at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
...

Googling didn't give me any indication on how to debug/solve this, so I'd be glad if I could get any indication where to start looking.

I'm using CMF4.0 currently, so Hive 0.8.1.

Thanks a lot!

David Morel

Reply via email to