Re: Skew Join Optimization in hive

Shantian Purkad Tue, 07 Jun 2011 13:55:54 -0700

We have given hints to use mapside joins on small tables.


We are planning to break this query into multiple, but would prefer options 
that help us keep the queries as is (with few modifications and tuning instead 
of breaking the queries into multiple steps as there is quite bit of complicate 
logic and dependencies in the joins)



________________________________
From: Igor Tatarinov <i...@decide.com>
To: user@hive.apache.org; Shantian Purkad <shantian_pur...@yahoo.com>
Sent: Tuesday, June 7, 2011 12:58 PM
Subject: Re: Skew Join Optimization in hive


Have you tried splitting the query into 2 or 3 steps and/or enabling map jons 
(SET hive.auto.convert.join = true;) if some of the tables are smallish?



On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad <shantian_pur...@yahoo.com> 
wrote:

Hi,
>
>I have a query which joins 12 different tables (most of them left outer joins) 
>and the query takes almost 3 hours. 90% of the time is taken by a single 
>reducer. One reducer is getting bulk of the data to process.
>
>How can I get around this and have fair distribution of data across all 
>reducers? I tried to enable the skewjoin optimization but getting below NPE 
>after first step of the job is executed.
>
>Any suggestions/ideas will be or great help.
>
>Thanks,
>Shantian
>
>2011-06-07 19:22:28,923 Stage-11 map = 100%,  reduce = 85%
>2011-06-07 19:22:30,932 Stage-11 map = 100%,  reduce = 100%
>Ended Job = job_201106071542_0010
>java.lang.NullPointerException
>    at
 
org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97)
>    at 
>org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
>    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130)
>    at 
>org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
>    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063)
>    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900)
>    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748)
>    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
>    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
>    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
>    at
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>    at 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
>FAILED: Execution Error, return code -101 from 
>org.apache.hadoop.hive.ql.exec.ConditionalTask
>hive> 
>
>

Re: Skew Join Optimization in hive

Reply via email to