[ https://issues.apache.org/jira/browse/HIVE-21799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855299#comment-16855299 ]
Vineet Garg commented on HIVE-21799: ------------------------------------ The existing logic to get to {{ExprNode}} for the given column name seems wrong. {code:java} ColumnInfo columnInfo = parentOfRS.getSchema().getColumnInfo(internalColName); {code} Above gets to the column info of {{parentOfRS}}, assuming {{parentOfRS}} is outputting a column named {{internalColName}} {code:java} ExprNodeDesc exprNode = null; if ( parentOfRS.getColumnExprMap() != null) { exprNode = parentOfRS.getColumnExprMap().get(internalColName).clone(); } else { exprNode = new ExprNodeColumnDesc(columnInfo); } {code} But this logic is looking for the same column {{internalColName}} in {{columnExprMap}} which is a mapping of {{parentOfRS's}} input column name to whatever corresponding expression {{parentOfRS}} will emit. This will work only if {{parentRS}} do not change the input column and output it as it is. Assuming that {{internalColName}} refers to the column coming out of {{parentOfRS}} then this should just be {code:java} exprNode = new ExprNodeColumnDesc(columnInfo); {code} I believe this change will also fix the issue here. In fact it should go ahead and create semi join instead of returning. > NullPointerException in DynamicPartitionPruningOptimization, when join key is > on aggregation column > --------------------------------------------------------------------------------------------------- > > Key: HIVE-21799 > URL: https://issues.apache.org/jira/browse/HIVE-21799 > Project: Hive > Issue Type: Bug > Components: Query Planning > Reporter: Jason Dere > Assignee: Jason Dere > Priority: Major > Attachments: HIVE-21799.1.patch, HIVE-21799.2.patch, > HIVE-21799.3.patch > > > Following table/query results in NPE: > {noformat} > create table tez_no_dynpart_hashjoin_on_agg(id int, outcome string, eventid > int) stored as orc; > explain select a.id, b.outcome from (select id, max(eventid) as event_id_max > from tez_no_dynpart_hashjoin_on_agg group by id) a > LEFT OUTER JOIN tez_no_dynpart_hashjoin_on_agg b > on a.event_id_max = b.eventid; > {noformat} > Stack trace: > {noformat} > java.lang.NullPointerException > at > org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.generateSemiJoinOperatorPlan(DynamicPartitionPruningOptimization.java:608) > at > org.apache.hadoop.hive.ql.optimizer.DynamicPartitionPruningOptimization.process(DynamicPartitionPruningOptimization.java:239) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.ForwardWalker.walk(ForwardWalker.java:74) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.runDynamicPartitionPruning(TezCompiler.java:584) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:165) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12562) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:370) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:289) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:671) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1905) > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1852) > at > org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1847) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.compileAndRespond(ReExecDriver.java:126) > at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:219) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:242) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:189) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:408) > at > org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:340) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClientInternal(QTestUtil.java:676) > at > org.apache.hadoop.hive.ql.QTestUtil.executeClient(QTestUtil.java:647) > at > org.apache.hadoop.hive.cli.control.CoreCliDriver.runTest(CoreCliDriver.java:182) > at > org.apache.hadoop.hive.cli.control.CliAdapter.runTest(CliAdapter.java:104) > at > org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver(TestMiniLlapLocalCliDriver.java:59) > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)