JoneZhang created HIVE-12716:
--------------------------------

             Summary: Hive on Spark map-join throw NullPointerException
                 Key: HIVE-12716
                 URL: https://issues.apache.org/jira/browse/HIVE-12716
             Project: Hive
          Issue Type: Bug
    Affects Versions: 1.2.1, 1.1.1
            Reporter: JoneZhang


The query is 
set hive.execution.engine=spark;
select
t3.pcid,channel,version,ip,hour,app_id,app_name,app_apk,app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num,
(case when t4.cnt is null then 0 else 1 end) as is_evil
from
(select /*+mapjoin(t2)*/
pcid,channel,version,ip,hour,
(case when t2.app_id is null then t1.app_id else t2.app_id end) as app_id,
t2.name as app_name,
app_apk,
app_version,app_type,dwl_tool,dwl_status,err_type,dwl_store,dwl_maxspeed,dwl_minspeed,dwl_avgspeed,last_time,dwl_num
from
t_ed_soft_downloadlog_molo t1 left outer join t_rd_soft_app_pkg_name t2 on 
(lower(t1.app_apk) = lower(t2.package_id) and t1.ds = 20151217 and t2.ds = 
20151217)
where
t1.ds = 20151217) t3
left outer join
(
select pcid,count(1) cnt  from t_ed_soft_evillog_molo where ds=20151217  group 
by pcid
) t4
on t3.pcid=t4.pcid;

Create table statements are as follows 
CREATE TABLE `t_ed_soft_downloadlog_molo`(
  `pcid` string, 
  `channel` string, 
  `version` string, 
  `ip` string, 
  `hour` string, 
  `app_id` bigint, 
  `app_name` string, 
  `app_apk` string, 
  `app_version` string, 
  `app_type` string, 
  `dwl_tool` string, 
  `dwl_status` string, 
  `err_type` string, 
  `dwl_store` string, 
  `dwl_maxspeed` string, 
  `dwl_minspeed` string, 
  `dwl_avgspeed` string, 
  `last_time` date, 
  `dwl_num` int)
PARTITIONED BY ( 
  `ds` bigint);

CREATE TABLE `t_rd_soft_app_pkg_name`(
  `app_id` bigint, 
  `cp_id` bigint, 
  `cat_id` bigint, 
  `package_id` string, 
  `name` string)
PARTITIONED BY ( 
  `ds` bigint);

CREATE TABLE `t_ed_soft_evillog_molo`(
  `imp_date` string, 
  `uin` string, 
  `pcid` string, 
  `appid` string, 
  `domain` string, 
  `action_type` string, 
  `via` string)
PARTITIONED BY ( 
  `ds` bigint);


And the error log is 
2015-12-18 08:10:18,685 INFO  [main]: spark.SparkMapJoinOptimizer 
(SparkMapJoinOptimizer.java:process(79)) - Check if it can be converted to map 
join
2015-12-18 08:10:18,686 ERROR [main]: ql.Driver 
(SessionState.java:printError(966)) - FAILED: NullPointerException null
java.lang.NullPointerException
        at 
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedParentMapJoinSize(SparkMapJoinOptimizer.java:312)
        at 
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getConnectedMapJoinSize(SparkMapJoinOptimizer.java:292)
        at 
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.getMapJoinConversionInfo(SparkMapJoinOptimizer.java:271)
        at 
org.apache.hadoop.hive.ql.optimizer.spark.SparkMapJoinOptimizer.process(SparkMapJoinOptimizer.java:80)
        at 
org.apache.hadoop.hive.ql.optimizer.spark.SparkJoinOptimizer.process(SparkJoinOptimizer.java:58)
        at 
org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:92)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:97)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:81)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:135)
        at 
org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:112)
        at 
org.apache.hadoop.hive.ql.parse.spark.SparkCompiler.optimizeOperatorPlan(SparkCompiler.java:128)
        at 
org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:102)
        at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10238)
        at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:210)
        at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:233)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:425)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:308)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1123)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1171)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1060)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1050)
        at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:208)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:160)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:447)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:357)
        at 
org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:795)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:767)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:704)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:212)


Some properties on hive-site.xml is 
        <property>
                <name>hive.ignore.mapjoin.hint</name>
                <value>false</value>
        </property>
        <property>
        <name>hive.auto.convert.join</name>
        <value>true</value>
        </property>
        <property>
                <name>hive.auto.convert.join.noconditionaltask</name>
                <value>true</value>
        </property>


The error relevant code is 
long mjSize = ctx.getMjOpSizes().get(op);
I think it should be checked whether or not  ctx.getMjOpSizes().get(op) is null.
Of course, more strict logic need to you to decide.


Thanks.
Best Wishes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to