[ 
https://issues.apache.org/jira/browse/HIVE-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855299#comment-16855299
 ] 

Vineet Garg commented on HIVE-21799:
------------------------------------

The existing logic to get to {{ExprNode}} for the given column name seems wrong.
{code:java}
    ColumnInfo columnInfo = 
parentOfRS.getSchema().getColumnInfo(internalColName);
{code}
Above gets to the column info of {{parentOfRS}}, assuming {{parentOfRS}} is 
outputting a column named {{internalColName}}
{code:java}
 ExprNodeDesc exprNode = null;
    if ( parentOfRS.getColumnExprMap() != null) {
      exprNode = parentOfRS.getColumnExprMap().get(internalColName).clone();
    } else {
      exprNode = new ExprNodeColumnDesc(columnInfo);
    }
{code}
But this logic is looking for the same column {{internalColName}} in 
{{columnExprMap}} which is a mapping of {{parentOfRS's}} input column name to 
whatever corresponding expression {{parentOfRS}} will emit. This will work only 
if {{parentRS}} do not change the input column and output it as it is.

Assuming that {{internalColName}} refers to the column coming out of 
{{parentOfRS}} then this should just be
{code:java}
exprNode = new ExprNodeColumnDesc(columnInfo);
{code}
I believe this change will also fix the issue here. In fact it should go ahead 
and create semi join instead of returning.

> NullPointerException in DynamicPartitionPruningOptimization, when join key is 
> on aggregation column
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-21799
>                 URL: https://issues.apache.org/jira/browse/HIVE-21799
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>         Attachments: HIVE-21799.1.patch, HIVE-21799.2.patch, 
> HIVE-21799.3.patch
>
>
> Following table/query results in NPE:
> {noformat}
> create table tez_no_dynpart_hashjoin_on_agg(id int, outcome string, eventid 
> int) stored as orc;
> explain select a.id, b.outcome from (select id, max(eventid) as event_id_max 
> from tez_no_dynpart_hashjoin_on_agg group by id) a 
> LEFT OUTER JOIN tez_no_dynpart_hashjoin_on_agg b 
> on a.event_id_max = b.eventid;
> {noformat}
> Stack trace:
> {noformat}
> java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:608)
>         at 
> org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:239)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
>         at 
> org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74)
>         at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
>         at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.runDynamicPartitionPruning(TezCompiler.java:584)
>         at 
> org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:165)
>         at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159)
>         at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562)
>         at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370)
>         at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289)
>         at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671)
>         at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905)
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852)
>         at 
> org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126)
>         at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242)
>         at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408)
>         at 
> org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676)
>         at 
> org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647)
>         at 
> org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:182)
>         at 
> org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104)
>         at 
> org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59)
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to