Have you tried splitting the query into 2 or 3 steps and/or enabling map jons (SET hive.auto.convert.join = true;) if some of the tables are smallish?
On Tue, Jun 7, 2011 at 12:31 PM, Shantian Purkad <shantian_pur...@yahoo.com>wrote: > Hi, > > I have a query which joins 12 different tables (most of them left outer > joins) and the query takes almost 3 hours. 90% of the time is taken by a > single reducer. One reducer is getting bulk of the data to process. > > How can I get around this and have fair distribution of data across all > reducers? I tried to enable the skewjoin optimization but getting below NPE > after first step of the job is executed. > > Any suggestions/ideas will be or great help. > > Thanks, > Shantian > > 2011-06-07 19:22:28,923 Stage-11 map = 100%, reduce = 85% > 2011-06-07 19:22:30,932 Stage-11 map = 100%, reduce = 100% > Ended Job = job_201106071542_0010 > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.plan.ConditionalResolverSkewJoin.getTasks(ConditionalResolverSkewJoin.java:97) > at > org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:130) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1063) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:900) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:748) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:186) > FAILED: Execution Error, return code -101 from > org.apache.hadoop.hive.ql.exec.ConditionalTask > hive> > >