[ 
https://issues.apache.org/jira/browse/HIVE-29654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18087890#comment-18087890
 ] 

Thomas Rebele commented on HIVE-29654:
--------------------------------------

Thanks for pointing out the query history! The scenario that I want to cover 
would be as follows:
----
An Apache Hive installation is used by many users concurrently. A user has a 
query that runs slowly. The user thinks this is a perf issue with Apache Hive, 
so they want to create a ticket. They want to include the CBO and stage plans 
in the ticket. They do not want to restart Apache Hive, as it would affect many 
users.
----
I had a look at the query history, and it seems that it needs to be enabled at 
boot time. I got an "Cannot modify hive.query.history.enabled at runtime (see 
HIVE-29170). It is in the list of parameters that can't be modified at runtime 
or is prefixed by a restricted variable", when I tried to set that conf option. 
I got query history working after a few restarts of Hive. The query history 
does not yet contain the CBO plans.

I think it would be good to have a way for option 1, returning the CBO and 
stage plans to the user, and option 4 as well.

> Provide an easy way to execute a query and get the CBO plan at the same time
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-29654
>                 URL: https://issues.apache.org/jira/browse/HIVE-29654
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Thomas Rebele
>            Priority: Major
>
> It would be nice if Hive provided an easy way to execute a query and get the 
> CBO and stage plans of that execution. Currently this is easy to achieve for 
> the stage plans, but not for the CBO plan (see background section below).
> Some approaches:
>  * output CBO plans if {{hive.log.explain.output}} is enabled
>  * add another config option (e.g., {{{}hive.log.cbo.plan{}}}) to output the 
> CBO plans when the query is executed
>  * add a new command EXPLAIN EXECUTE that prints the query plans (CBO and 
> stage plans) and then executes these plans
> ----
> Background: 
> I've looked into the results of a TPC-DS benchmark. I wanted to understand 
> why the performance of a query has changed. I had the execution time of the 
> query, some logs, and some EXPLAIN plans. Unfortunately the EXPLAIN plans 
> were not helpful, as the plans were not the same (different operator ids).
> I did an experiment with TPC-DS metadata and KLL histograms enabled. I 
> executed EXPLAIN CBO JOINCOST several times in the same session. For 8 runs, 
> I got 5 different plans.
> I had a chat with an AI tool, and it suggested some options to get the plan. 
> After a while I got it working:
>  * stage plans: {{SET hive.server2.logging.operation.level = VERBOSE; SET 
> hive.log.explain.output=true;}}
>  * CBO plan: enable DEBUG logging for 
> {{org.apache.hadoop.hive.ql.parse.CalcitePlanner}} in 
> {{{}hive-log4j2.properties{}}}. Also executing {{SET 
> hive.server2.logging.operation.level = VERBOSE;}} helps, as the plans will 
> then show up in beeline.
> It would be nice to make this a bit simpler. It might be the case that the 
> user cannot easily change the log4j2 properties. Also, setting the DEBUG 
> level globally just to get the plan for one query looks like a lot of 
> overhead, if other users are running queries at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to