mapjoin followed by union all, groupby does not work
----------------------------------------------------

                 Key: HIVE-2262
                 URL: https://issues.apache.org/jira/browse/HIVE-2262
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
    Affects Versions: 0.7.1
            Reporter: yu xiang
            Priority: Trivial
             Fix For: 0.7.1


sql:
CREATE TABLE nulltest2(int_data1 INT, int_data2 INT, boolean_data BOOLEAN, 
double_data DOUBLE, string_data STRING) ROW FORMAT DELIMITED FIELDS TERMINATED 
BY ',';

CREATE TABLE nulltest3(int_data1 INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY 
',';

explain select int_data2,count(1) from (select /*+mapjoin(a)*/ int_data2, 1 as 
c1, 0 as c2 from nulltest2 a join nulltest3 b on(a.int_data1 = b.int_data1) 
union all select /*+mapjoin(a)*/ int_data2, 1 as c1, 2 as c2 from nulltest2 a 
join nulltest3 b on(a.int_data1 = b.int_data1)) mapjointable group by int_data2;

exception:
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.optimizer.ppr.PartitionPruner.prune(PartitionPruner.java:156)
        at 
org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:551)
        at 
org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.setTaskPlan(GenMapRedUtils.java:514)
        at 
org.apache.hadoop.hive.ql.optimizer.GenMapRedUtils.initPlan(GenMapRedUtils.java:125)
        at 
org.apache.hadoop.hive.ql.optimizer.GenMRRedSink1.process(GenMRRedSink1.java:76)
        at 
org.apache.hadoop.hive.ql.optimizer.GenMRRedSink3.process(GenMRRedSink3.java:64)
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)

Analyse the reason:
1.When use mapjoin,union,groupby together,the 
UnionProcFactory.MapJoinUnion()(optimizer) will set the MapJoinSubq true, and 
set up the UnionParseContext.
2.In GenMRUnion1, hive will call mergeMapJoinUnion, and also set task plan.
3.In GenMRRedSink3, hive judges the uCtx.isMapOnlySubq(), and call 
GenMRRedSink1()).process() to init the plan.But the utask's plan has been set 
yet, it just need to set reducer.And also the utask is processing temporary 
table, there is no topOp map to table.So here we get null exception.

Solutions:
1.SQL solution:use a sub query to modify the sql;
2.Code solution:when in mergeMapJoinUnion, after the task plan have been set, 
set a settaskplan flag true to indicate the plan for this utask has been 
set.When in GenMRRedSink3 ,if this flag sets true, don't use the 
GenMRRedSink1()).process() to reinit the plan.
++++++++++++++++++++++++++++
if (uCtx.isMapOnlySubq()&&!upc.isIssetTaskPlan())
++++++++++++++++++++++++++++

I don't know whether the code solution is suitable.
Is there any better solution?
thx





--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to