[ https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=554348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554348 ]
ASF GitHub Bot logged work on HIVE-23882: ----------------------------------------- Author: ASF GitHub Bot Created on: 18/Feb/21 17:19 Start Date: 18/Feb/21 17:19 Worklog Time Spent: 10m Work Description: kgyrtkirk commented on a change in pull request #1286: URL: https://github.com/apache/hive/pull/1286#discussion_r578602640 ########## File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out ########## @@ -57,6 +57,7 @@ STAGE PLANS: TableScan alias: src filterExpr: key is not null (type: boolean) + probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, bigKeyColName:key, smallTablePos:0, keyRatio:1.582 Review comment: why is `keyRatio` above 1? shouldn't it mean the expected selectivity of the operation? ########## File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java ########## @@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) { */ public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources, Operator<?> current, Operator<?> terminal) throws SemanticException { - return backtrack(sources, current, terminal, false); + return backtrack(sources, current, terminal, false, false); } public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources, - Operator<?> current, Operator<?> terminal, boolean foldExpr) throws SemanticException { - ArrayList<ExprNodeDesc> result = new ArrayList<ExprNodeDesc>(); + Operator<?> current, Operator<?> terminal, boolean foldExpr, boolean skipRSParent) throws SemanticException { Review comment: I think `skipRSParent` is a bit misleading ; you don't want to skip the RS - you want to stay in the same vertex ########## File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java ########## @@ -1589,13 +1588,17 @@ private void removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx) List<ExprNodeDesc> keyDesc = selectedMJOp.getConf().getKeys().get(posBigTable); ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0); - String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, keyCol.getColumn()); - if (realTSColName != null) { + ExprNodeColumnDesc originTSColExpr = OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp); + if (originTSColExpr == null) { + LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ Schema: {}", Review comment: current algorithm seems to be: * select best mj candidate * do some further processing - which may bail out bailing out for the best candidate doesn't neccessarily mean that we will still bail out for a less charming candidate - I think it might worth to try to restructure the extra compilation into to for loop - or instead of selecting the best candidate the first part could be implemented as a priority logic just an idea for a followup ########## File path: ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java ########## @@ -120,7 +120,7 @@ public Object process(Node nd, Stack<Node> stack, NodeProcessorCtx procCtx, String outputColumnName = cSELOutputColumnNames.get(i); ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i); ExprNodeDesc newPSELExprNodeDesc = - ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true); + ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, true, false); Review comment: instead of modifying every callsite - can we have a method with the original signature? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 554348) Time Spent: 2h 40m (was: 2.5h) > Compiler should skip MJ keyExpr for probe optimization > ------------------------------------------------------ > > Key: HIVE-23882 > URL: https://issues.apache.org/jira/browse/HIVE-23882 > Project: Hive > Issue Type: Sub-task > Reporter: Panagiotis Garefalakis > Assignee: Panagiotis Garefalakis > Priority: Major > Labels: pull-request-available > Time Spent: 2h 40m > Remaining Estimate: 0h > > In probe we cannot currently support Key expressions (on the big table Side) > as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at > that level). > TezCompiler should take this into account when picking MJs to push probe > details -- This message was sent by Atlassian Jira (v8.3.4#803005)