[ 
https://issues.apache.org/jira/browse/HIVE-23882?focusedWorklogId=554348&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-554348
 ]

ASF GitHub Bot logged work on HIVE-23882:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Feb/21 17:19
            Start Date: 18/Feb/21 17:19
    Worklog Time Spent: 10m 
      Work Description: kgyrtkirk commented on a change in pull request #1286:
URL: https://github.com/apache/hive/pull/1286#discussion_r578602640



##########
File path: ql/src/test/results/clientpositive/llap/auto_join10.q.out
##########
@@ -57,6 +57,7 @@ STAGE PLANS:
                 TableScan
                   alias: src
                   filterExpr: key is not null (type: boolean)
+                  probeDecodeDetails: cacheKey:HASH_MAP_MAPJOIN_30_container, 
bigKeyColName:key, smallTablePos:0, keyRatio:1.582

Review comment:
       why is `keyRatio` above 1? shouldn't it mean the expected selectivity of 
the operation?

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDescUtils.java
##########
@@ -362,26 +362,26 @@ public static boolean isDeterministic(ExprNodeDesc desc) {
    */
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
       Operator<?> current, Operator<?> terminal) throws SemanticException {
-    return backtrack(sources, current, terminal, false);
+    return backtrack(sources, current, terminal, false, false);
   }
 
   public static ArrayList<ExprNodeDesc> backtrack(List<ExprNodeDesc> sources,
-      Operator<?> current, Operator<?> terminal, boolean foldExpr) throws 
SemanticException {
-    ArrayList<ExprNodeDesc> result = new ArrayList<ExprNodeDesc>();
+      Operator<?> current, Operator<?> terminal, boolean foldExpr, boolean 
skipRSParent) throws SemanticException {

Review comment:
       I think `skipRSParent` is a bit misleading ; you don't want to skip the 
RS - you want to stay in the same vertex

##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/TezCompiler.java
##########
@@ -1589,13 +1588,17 @@ private void 
removeSemijoinsParallelToMapJoin(OptimizeTezProcContext procCtx)
 
       List<ExprNodeDesc> keyDesc = 
selectedMJOp.getConf().getKeys().get(posBigTable);
       ExprNodeColumnDesc keyCol = (ExprNodeColumnDesc) keyDesc.get(0);
-      String realTSColName = OperatorUtils.findTableColNameOf(selectedMJOp, 
keyCol.getColumn());
-      if (realTSColName != null) {
+      ExprNodeColumnDesc originTSColExpr = 
OperatorUtils.findTableOriginColExpr(keyCol, selectedMJOp, tsOp);
+      if (originTSColExpr == null) {
+        LOG.warn("ProbeDecode could not find origTSCol for mjCol: {} with MJ 
Schema: {}",

Review comment:
       current algorithm seems to be:
   * select best mj candidate
   * do some further processing - which may bail out
   
   bailing out for the best candidate doesn't neccessarily mean that we will 
still bail out for a less charming candidate - I think it might worth to try to 
restructure the extra compilation into to for loop - or instead of selecting 
the best candidate the first part could be implemented as a priority logic
   
   just an idea for a followup

##########
File path: 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/NonBlockingOpDeDupProc.java
##########
@@ -120,7 +120,7 @@ public Object process(Node nd, Stack<Node> stack, 
NodeProcessorCtx procCtx,
             String outputColumnName = cSELOutputColumnNames.get(i);
             ExprNodeDesc cSELExprNodeDesc = cSELColList.get(i);
             ExprNodeDesc newPSELExprNodeDesc =
-                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, 
true);
+                ExprNodeDescUtils.backtrack(cSELExprNodeDesc, cSEL, pSEL, 
true, false);

Review comment:
       instead of modifying every callsite - can we have a method with the 
original signature?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 554348)
    Time Spent: 2h 40m  (was: 2.5h)

> Compiler should skip MJ keyExpr for probe optimization
> ------------------------------------------------------
>
>                 Key: HIVE-23882
>                 URL: https://issues.apache.org/jira/browse/HIVE-23882
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Panagiotis Garefalakis
>            Assignee: Panagiotis Garefalakis
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> In probe we cannot currently support Key expressions (on the big table Side) 
> as ORC CVs Probe directly the smalltable HT (there is no expr evaluation at 
> that level).
> TezCompiler should take this into account when picking MJs to push probe 
> details



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to