[jira] [Commented] (HIVE-14362) Support explain analyze in Hive

Gopal V (JIRA) Mon, 22 Aug 2016 18:27:43 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15431939#comment-15431939
 ]


Gopal V commented on HIVE-14362:
--------------------------------

[~pxiong]: tested this patch - running explain analyze seems to disable 
vectorization for all queries after that point.

{code}
+      HiveConf.setBoolVar(conf, HiveConf.ConfVars.HIVE_VECTORIZATION_ENABLED, 
false);
{code}

And explain analyze does not actually work.

{code}
2016-08-23T01:13:10,961  INFO [667a4e5f-6194-438f-85d6-339aca3ebecc main] 
physical.AnnotateRunTimeStatsOptimizer: setRuntimeStatsDir for RS_8
2016-08-23T01:13:10,962  INFO [667a4e5f-6194-438f-85d6-339aca3ebecc main] 
fs.FSStatsPublisher: created : 
file:/tmp/gopal/667a4e5f-6194-438f-85d6-339aca3ebecc/hive_2016-08-23_01-13-10_705_7555853843090786759-1/-local-10000/RS_8
{code}

The paths for output are in local dirs, not the HDFS dirs - so the stats 
written on a machine are not making their way back to the HiveServer2 box.

{code}
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: [Error 30002]: 
StatsPublisher cannot be connected to.There was a error while connecting to the 
StatsPublisher, and retrying might help. If you dont want the query to fail 
because accurate statistics could not be collected, set 
hive.stats.reliable=false
        at 
org.apache.hadoop.hive.ql.exec.Operator.publishRunTimeStats(Operator.java:1444)
        at org.apache.hadoop.hive.ql.exec.Operator.closeOp(Operator.java:723)
        at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.closeOp(TableScanOperator.java:270)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:691)
        at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:705)
        at 
org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:433)
{code}

> Support explain analyze in Hive
> -------------------------------
>
>                 Key: HIVE-14362
>                 URL: https://issues.apache.org/jira/browse/HIVE-14362
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Pengcheng Xiong
>            Assignee: Pengcheng Xiong
>         Attachments: HIVE-14362.01.patch, HIVE-14362.02.patch, 
> compare_on_cluster.pdf
>
>
> Right now all the explain levels only support stats before query runs. We 
> would like to have an explain analyze similar to Postgres for real stats 
> after query runs. This will help to identify the major gap between 
> estimated/real stats and make not only query optimization better but also 
> query performance debugging easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14362) Support explain analyze in Hive

Reply via email to