[ 
https://issues.apache.org/jira/browse/HIVE-29654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087888#comment-18087888
 ] 

Stamatis Zampetakis commented on HIVE-29654:
--------------------------------------------

If the request is to extend {{hive.log.explain.output}} to include CBO plan in 
the logs then that's fine with me. Looking into the code around this flag it 
appears that we may have this already (in JSON format) if we enable 
{{hive.server2.webui.show.graph}}. Anyways, let's confirm what we are missing 
from {{hive.log.explain.output}} (and the other related configs) and if the CBO 
plan is not there find a way to incorporate it.

At this stage maybe we should leave aside the discussion about a completely new 
command (EXPLAIN EXECUTE) to avoid perplexing the proposal. Minor extensions to 
hive.log.explain.output will be easier to land if we agree that this satisfies 
our needs.

> Provide an easy way to execute a query and get the CBO plan at the same time
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-29654
>                 URL: https://issues.apache.org/jira/browse/HIVE-29654
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Thomas Rebele
>            Priority: Major
>
> It would be nice if Hive provided an easy way to execute a query and get the 
> CBO and stage plans of that execution. Currently this is easy to achieve for 
> the stage plans, but not for the CBO plan (see background section below).
> Some approaches:
>  * output CBO plans if {{hive.log.explain.output}} is enabled
>  * add another config option (e.g., {{{}hive.log.cbo.plan{}}}) to output the 
> CBO plans when the query is executed
>  * add a new command EXPLAIN EXECUTE that prints the query plans (CBO and 
> stage plans) and then executes these plans
> ----
> Background: 
> I've looked into the results of a TPC-DS benchmark. I wanted to understand 
> why the performance of a query has changed. I had the execution time of the 
> query, some logs, and some EXPLAIN plans. Unfortunately the EXPLAIN plans 
> were not helpful, as the plans were not the same (different operator ids).
> I did an experiment with TPC-DS metadata and KLL histograms enabled. I 
> executed EXPLAIN CBO JOINCOST several times in the same session. For 8 runs, 
> I got 5 different plans.
> I had a chat with an AI tool, and it suggested some options to get the plan. 
> After a while I got it working:
>  * stage plans: {{SET hive.server2.logging.operation.level = VERBOSE; SET 
> hive.log.explain.output=true;}}
>  * CBO plan: enable DEBUG logging for 
> {{org.apache.hadoop.hive.ql.parse.CalcitePlanner}} in 
> {{{}hive-log4j2.properties{}}}. Also executing {{SET 
> hive.server2.logging.operation.level = VERBOSE;}} helps, as the plans will 
> then show up in beeline.
> It would be nice to make this a bit simpler. It might be the case that the 
> user cannot easily change the log4j2 properties. Also, setting the DEBUG 
> level globally just to get the plan for one query looks like a lot of 
> overhead, if other users are running queries at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to