Hi David, It seems like Hive is unable to find the skewed keys on HDFS. Did you set *hive.skewjoin.key property? If so, to what value?*
Mark On Fri, Nov 30, 2012 at 2:10 AM, David Morel <david.mo...@amakuru.net>wrote: > Hi, > > I am trying to solve the "last reducer hangs because of GC because of > truckloads of data" issue that I have on some queries, by using SET > hive.optimize.skewjoin=true; Unfortunately, every time I try this, I > encounter an error of the form: > ... > 2012-11-30 10:42:39,181 Stage-10 map = 100%, reduce = 100%, Cumulative > CPU 406984.1 sec > MapReduce Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds > 100 msec > Ended Job = job_201211281801_0463 > java.io.FileNotFoundException: File hdfs://nameservice1/tmp/hive-** > dmorel/hive_2012-11-30_09-23-**00_375_8178040921995939301/-** > mr-10014/hive_skew_join_**bigkeys_0 does not exist. > at org.apache.hadoop.hdfs.**DistributedFileSystem.**listStatus(** > DistributedFileSystem.java:**365) > at org.apache.hadoop.hive.ql.**plan.**ConditionalResolverSkewJoin. > **getTasks(**ConditionalResolverSkewJoin.**java:96) > at org.apache.hadoop.hive.ql.**exec.ConditionalTask.execute(** > ConditionalTask.java:81) > at org.apache.hadoop.hive.ql.**exec.Task.executeTask(Task.** > java:133) > at org.apache.hadoop.hive.ql.**exec.TaskRunner.runSequential(** > TaskRunner.java:57) > at org.apache.hadoop.hive.ql.**Driver.launchTask(Driver.java:** > 1332) > at org.apache.hadoop.hive.ql.**Driver.execute(Driver.java:**1123) > at org.apache.hadoop.hive.ql.**Driver.run(Driver.java:931) > ... > > Googling didn't give me any indication on how to debug/solve this, so I'd > be glad if I could get any indication where to start looking. > > I'm using CMF4.0 currently, so Hive 0.8.1. > > Thanks a lot! > > David Morel >