[ https://issues.apache.org/jira/browse/HIVE-26006?focusedWorklogId=764036&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-764036 ]
ASF GitHub Bot logged work on HIVE-26006: ----------------------------------------- Author: ASF GitHub Bot Created on: 29/Apr/22 07:22 Start Date: 29/Apr/22 07:22 Worklog Time Spent: 10m Work Description: kasakrisz commented on code in PR #3082: URL: https://github.com/apache/hive/pull/3082#discussion_r861520601 ########## ql/src/java/org/apache/hadoop/hive/ql/plan/TopNKeyDesc.java: ########## @@ -252,7 +252,8 @@ public TopNKeyDescExplainVectorization getTopNKeyVectorization() { public TopNKeyDesc combine(CommonKeyPrefix commonKeyPrefix) { return new TopNKeyDesc(topN, commonKeyPrefix.getMappedOrder(), commonKeyPrefix.getMappedNullOrder(), commonKeyPrefix.getMappedColumns(), - commonKeyPrefix.getMappedColumns().subList(0, partitionKeyColumns.size()), + commonKeyPrefix.getMappedColumns() + .subList(0, Math.min(partitionKeyColumns.size(), commonKeyPrefix.getMappedColumns().size())), Review Comment: Yes, this prevents the `ArrayIndexOutOfBoundsException`. However the rest of the patch is also useful since executing TNKs which all keys are partition keys is a waste of CPU time: such instances are always forward operators. TNK operators maintain `TopNKeyFilters` for each partition key values. A row is processed by the TNK op. like: 1. extract the partition key values from the row 2. lookup the corresponding `TopNKeyFilter` 3. extract the topN key values from the row and pass it to the `TopNKeyFilter` 4. The `TopNKeyFilter` stores the top n keys and decides whether the row can be forwarded. Issue Time Tracking ------------------- Worklog Id: (was: 764036) Time Spent: 50m (was: 40m) > TopNKey and PTF with more than one column is failing with IOBE > -------------------------------------------------------------- > > Key: HIVE-26006 > URL: https://issues.apache.org/jira/browse/HIVE-26006 > Project: Hive > Issue Type: Bug > Reporter: Naresh P R > Assignee: Krisztian Kasa > Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > {code:java} > java.lang.IndexOutOfBoundsException: toIndex = 2 > at java.util.ArrayList.subListRangeCheck(ArrayList.java:1014) > at java.util.ArrayList.subList(ArrayList.java:1006) > at org.apache.hadoop.hive.ql.plan.TopNKeyDesc.combine(TopNKeyDesc.java:201) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdownThroughGroupBy(TopNKeyPushdownProcessor.java:162) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.pushdown(TopNKeyPushdownProcessor.java:76) > at > org.apache.hadoop.hive.ql.optimizer.topnkey.TopNKeyPushdownProcessor.process(TopNKeyPushdownProcessor.java:57) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:158) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.runTopNKeyOptimization(TezCompiler.java:1305) > at > org.apache.hadoop.hive.ql.parse.TezCompiler.optimizeOperatorPlan(TezCompiler.java:173) > at org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:159) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:12646) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:358) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:283) > at org.apache.hadoop.hive.ql.Compiler.analyze(Compiler.java:219) > at org.apache.hadoop.hive.ql.Compiler.compile(Compiler.java:103) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:215){code} -- This message was sent by Atlassian Jira (v8.20.7#820007)