We have given hints to use mapside joins on small tables.
We are planning to break this query into multiple, but would prefer options that help us keep the queries as is (with few modifications and tuning instead of breaking the queries into multiple steps as there is quite bit of complicate logic and dependencies in the joins) ________________________________ From: Igor Tatarinov <i...@decide.com> To: user@hive.apache.org; Shantian Purkad <shantian_pur...@yahoo.com> Sent: Tuesday, June 7, 2011 12:58 PM Subject: Re: Skew Join Optimization in hive Have you tried splitting the query into 2 or 3 steps and/or enabling map jons (SET hive.auto.convert.join = true;) if some of the tables are smallish? On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad <shantian_pur...@yahoo.com> wrote: Hi, > >I have a query which joins 12 different tables (most of them left outer joins) >and the query takes almost 3 hours. 90% of the time is taken by a single >reducer. One reducer is getting bulk of the data to process. > >How can I get around this and have fair distribution of data across all >reducers? I tried to enable the skewjoin optimization but getting below NPE >after first step of the job is executed. > >Any suggestions/ideas will be or great help. > >Thanks, >Shantian > >2011-06-07 19:22:28,923 Stage-11 map = 100%, reduce = 85% >2011-06-07 19:22:30,932 Stage-11 map = 100%, reduce = 100% >Ended Job = job_201106071542_0010 >java.lang.NullPointerException > at org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97) > at >org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) > at >org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at >sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at >sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) >FAILED: Execution Error, return code -101 from >org.apache.hadoop.hive.ql.exec.ConditionalTask >hive> > >