[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path

ASF GitHub Bot (Jira) Wed, 15 Dec 2021 04:42:42 -0800


     [ 
https://issues.apache.org/jira/browse/HIVE-25792?focusedWorklogId=696590&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-696590
 ]


ASF GitHub Bot logged work on HIVE-25792:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 15/Dec/21 12:41
            Start Date: 15/Dec/21 12:41
    Worklog Time Spent: 10m 
      Work Description: zabetak commented on a change in pull request #2865:
URL: https://github.com/apache/hive/pull/2865#discussion_r769579986



##########
File path: ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java
##########
@@ -673,8 +672,7 @@ Operator genOPTree(ASTNode ast, PlannerContext plannerCtx) 
throws SemanticExcept
           }
           this.ctx.setCboInfo(cboMsg);
 
-          // Determine if we should re-throw the exception OR if we try to 
mark plan as reAnalyzeAST to retry
-          // planning as non-CBO.
+          // Determine if we should re-throw the exception OR if we try to 
mark the query to retry as non-CBO.

Review comment:
       I am wondering if it would be better to just throw here and let the 
reexecution plugin deal with what do to afterwards.

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5561,7 +5563,8 @@ private static void populateLlapDaemonVarsSet(Set<String> 
llapDaemonVarsSetLocal
         "Size of the runtime statistics cache. Unit is: OperatorStat entry; a 
query plan consist ~100."),
     HIVE_QUERY_PLANMAPPER_LINK_RELNODES("hive.query.planmapper.link.relnodes", 
true,
         "Whether to link Calcite nodes to runtime statistics."),
-
+    HIVE_QUERY_MAX_RECOMPILATION_COUNT("hive.query.recompilation.max.count", 1,
+        "Maximum number of re-compilations for a single query."),

Review comment:
       Why do we need the number of recompilations to be configurable?

##########
File path: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
##########
@@ -5536,10 +5536,12 @@ private static void 
populateLlapDaemonVarsSet(Set<String> llapDaemonVarsSetLocal
 
     HIVE_QUERY_REEXECUTION_ENABLED("hive.query.reexecution.enabled", true,
         "Enable query reexecutions"),
-    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies", 
"overlay,reoptimize,reexecute_lost_am,dagsubmit",
+    HIVE_QUERY_REEXECUTION_STRATEGIES("hive.query.reexecution.strategies",
+        "overlay,reoptimize,reexecute_lost_am,dagsubmit,recompile_without_cbo",
         "comma separated list of plugin can be used:\n"
             + "  overlay: hiveconf subtree 'reexec.overlay' is used as an 
overlay in case of an execution errors out\n"
             + "  reoptimize: collects operator statistics during execution and 
recompile the query after a failure\n"
+            + "  recompile_without_cbo: recompiles query after a CBO failure\n"
             + "  reexecute_lost_am: reexecutes query if it failed due to tez 
am node gets decommissioned"),

Review comment:
       I didn't go through the whole PR but I get the impression that we 
could/should combine the `hive.query.reexecution.strategies` and 
`hive.cbo.fallback.strategy` configurations somehow. Having both does not seem 
necessary and raises some questions about the expected behavior.
   
   Consider for instance the following:
   ```
   set hive.cbo.fallback.strategy = always;
   -- Note that recompile_without_cbo is missing 
   set hive.query.reexecution.strategies = 
overlay,reoptimize,reexecute_lost_am,dagsubmit;
   ```
   Reading the current configuration I am not sure what should happen when the 
CBO fails.
   
   The `hive.cbo.fallback.strategy` has not been released yet so we are free to 
drop it, or modify it to be consistent with `hive.query.reexecution.strategies`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 696590)
    Time Spent: 3h 40m  (was: 3.5h)

> Multi Insert query fails on CBO path 
> -------------------------------------
>
>                 Key: HIVE-25792
>                 URL: https://issues.apache.org/jira/browse/HIVE-25792
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Zoltan Haindrich
>            Assignee: Peter Vary
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> {code}
> set hive.cbo.enable=true;
> drop table if exists aa1;
> drop table if exists bb1;
> drop table if exists cc1;
> drop table if exists dd1;
> drop table if exists ee1;
> drop table if exists ff1;
> create table aa1 ( stf_id string);
> create table bb1 ( stf_id string);
> create table cc1 ( stf_id string);
> create table ff1 ( x string);
> explain
> from ff1 as a join cc1 as b 
> insert overwrite table aa1 select   stf_id GROUP BY b.stf_id
> insert overwrite table bb1 select b.stf_id GROUP BY b.stf_id
> ;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Work logged] (HIVE-25792) Multi Insert query fails on CBO path

Reply via email to