Re: Skew join failure

Mark Grover Fri, 30 Nov 2012 07:46:49 -0800

Hi David,
It seems like Hive is unable to find the skewed keys on HDFS.
Did you set *hive.skewjoin.key property? If so, to what value?*


Mark

On Fri, Nov 30, 2012 at 2:10 AM, David Morel <david.mo...@amakuru.net>wrote:

> Hi,
>
> I am trying to solve the "last reducer hangs because of GC because of
> truckloads of data" issue that I have on some queries, by using SET
> hive.optimize.skewjoin=true; Unfortunately, every time I try this, I
> encounter an error of the form:
> ...
> 2012-11-30 10:42:39,181 Stage-10 map = 100%,  reduce = 100%, Cumulative
> CPU 406984.1 sec
> MapReduce Total cumulative CPU time: 4 days 17 hours 3 minutes 4 seconds
> 100 msec
> Ended Job = job_201211281801_0463
> java.io.FileNotFoundException: File hdfs://nameservice1/tmp/hive-**
> dmorel/hive_2012-11-30_09-23-**00_375_8178040921995939301/-**
> mr-10014/hive_skew_join_**bigkeys_0 does not exist.
>         at org.apache.hadoop.hdfs.**DistributedFileSystem.**listStatus(**
> DistributedFileSystem.java:**365)
>         at org.apache.hadoop.hive.ql.**plan.**ConditionalResolverSkewJoin.
> **getTasks(**ConditionalResolverSkewJoin.**java:96)
>         at org.apache.hadoop.hive.ql.**exec.ConditionalTask.execute(**
> ConditionalTask.java:81)
>         at org.apache.hadoop.hive.ql.**exec.Task.executeTask(Task.**
> java:133)
>         at org.apache.hadoop.hive.ql.**exec.TaskRunner.runSequential(**
> TaskRunner.java:57)
>         at org.apache.hadoop.hive.ql.**Driver.launchTask(Driver.java:**
> 1332)
>         at org.apache.hadoop.hive.ql.**Driver.execute(Driver.java:**1123)
>         at org.apache.hadoop.hive.ql.**Driver.run(Driver.java:931)
> ...
>
> Googling didn't give me any indication on how to debug/solve this, so I'd
> be glad if I could get any indication where to start looking.
>
> I'm using CMF4.0 currently, so Hive 0.8.1.
>
> Thanks a lot!
>
> David Morel
>

Re: Skew join failure

Reply via email to