[ 
https://issues.apache.org/jira/browse/HIVE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936327#comment-13936327
 ] 

Lefty Leverenz commented on HIVE-3777:
--------------------------------------

Documented in the wiki's Configuration Properties, please review:

{quote}
hive.stats.reliable
Default Value: false
Added In: Hive 0.10.0 with HIVE-1653
New Behavior In:  Hive 0.13.0 with HIVE-3777

Whether queries will fail because statistics cannot be collected completely 
accurately. If this is set to true, reading/writing from/into a partition or 
unpartitioned table may fail because the statistics could not be computed 
accurately. If it is set to false, the operation will succeed.

In Hive 0.13.0 and later, if hive.stats.reliable is false and statistics could 
not be computed correctly, the operation can still succeed and update the 
statistics but it sets a partition property "areStatsAccurate" to false. If the 
application needs accurate statistics, they can then be obtained in the 
background.
{quote}

Questions: 

# Does an unpartitioned table have the "areStatsAccurate" property too?
# Does the new behavior happen when hive.stats.reliable is false, not true?  (I 
ask because the jira description implies that this is a fix for the problem of 
long-running queries failing when statistics aren't accurate, but as I 
understand it the query doesn't fail when hive.stats.reliable is false.  
Perhaps I'm confused, so please make sure the wikidoc is correct.)

Quick ref:
* [Language Manual -- Configuration Properties:  hive.stats.reliable 
|https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.stats.reliable]

> add a property in the partition to figure out if stats are accurate
> -------------------------------------------------------------------
>
>                 Key: HIVE-3777
>                 URL: https://issues.apache.org/jira/browse/HIVE-3777
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>    Affects Versions: 0.13.0
>            Reporter: Namit Jain
>            Assignee: Ashutosh Chauhan
>             Fix For: 0.13.0
>
>         Attachments: HIVE-3777.2.patch, HIVE-3777.2.patch, HIVE-3777.3.patch, 
> HIVE-3777.4.patch, HIVE-3777.5.patch, HIVE-3777.patch
>
>
> Currently, stats task tries to update the statistics in the table/partition
> being updated after the table/partition is loaded. In case of a failure to 
> update these stats (due to the any reason), the operation either succeeds
> (writing inaccurate stats) or fails depending on whether hive.stats.reliable
> is set to true. This can be bad for applications who do not always care about
> reliable stats, since the query may have taken a long time to execute and then
> fail eventually.
> Another property should be added to the partition: areStatsAccurate. If 
> hive.stats.reliable is
> set to false, and stats could not be computed correctly, the operation would
> still succeed, update the stats, but set areStatsAccurate to false.
> If the application cares about accurate stats, it can be obtained in the 
> background.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to