Hi,
I am trying to solve the "last reducer hangs because of GC because of
truckloads of data" issue that I have on some queries, by using SET
hive.optimize.skewjoin=true; Unfortunately, every time I try this, I
encounter an error of the form:
...
2012-11-30 10:42:39,181 Stage-10 map = 100%, reduce = 100%, Cumulative
CPU 406984.1 sec
MapReduce Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds
100 msec
Ended Job = job_201211281801_0463
java.io.FileNotFoundException: File
hdfs://nameservice1/tmp/hive-dmorel/hive_2012-11-30_09-23-00_375_8178040921995939301/-mr-10014/hive_skew_join_bigkeys_0
does not exist.
at
org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:365)
at
org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:96)
at
org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:133)
at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1332)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1123)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:931)
...
Googling didn't give me any indication on how to debug/solve this, so
I'd be glad if I could get any indication where to start looking.
I'm using CMF4.0 currently, so Hive 0.8.1.
Thanks a lot!
David Morel