Hello,

I have been trying to optimize one of my longer running queries using a
MAPJOIN hint. The query is fairly complex and it joins my base table (1+
billion rows) with multiple metadata tables (which are relatively small in
size).

I already use a STREAMTABLE hint for my large table and have provided
multiple MAPJOIN hints for each metadata table.

Everything is smooth sailing till the last step, each map/reduce step
finishes in under 10 minutes, it used to take 4+ hours prior to that. The
last step does not run even a single map successfully and fails with the
following exception:

2011-04-07 19:09:49,254 WARN org.apache.hadoop.mapred.TaskTracker:
Error running child
java.lang.RuntimeException
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:188)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.exec.MapJoinOperator.processOp(MapJoinOperator.java:218)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
        at 
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:81)
        at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:386)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:598)
        at 
org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:347)
        at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:171)
        ... 4 more

I know this is more like a hadoop side exception but since the jar's
which run the job are auto-generated, I don't know what's the best
place to start debugging this issue. Any pointers ?

Also when adding multiple hint's do I just comma-separate them or is
there something else that I need to take care of i.e.

SELECT /*+ STREAMTABLE(t1), MAPJOIN(t2), MAPJOIN(t3), MAPJOIN(t4) */ .....

Thanks,

Viral

Reply via email to