[ 
https://issues.apache.org/jira/browse/HIVE-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16472377#comment-16472377
 ] 

Jason Dere commented on HIVE-19489:
-----------------------------------

So we've seen a lot of users who end up with very wrong stats because most of 
the data has been written to by external apps, which can be as bad for planning 
as no stats. The point of this would be to put the responsibility on the user 
to call ANALYZE TABLE to keep stats up-to-date, rather than assuming this will 
be taken care of by auto-gather. I'll try to follow up with Ashutosh on this 
one.

You are right that LOAD DATA does not seem to fully perform stats auto-gather, 
though there still does appear to be some StatsWork that shows up in the plan 
during load.

> Disable stats autogather for external tables
> --------------------------------------------
>
>                 Key: HIVE-19489
>                 URL: https://issues.apache.org/jira/browse/HIVE-19489
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Statistics
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>            Priority: Major
>
> Hive auto-gather of table statistics can result in incorrect generation of 
> stats (and the stats being marked as accurate) in the case of external tables 
> where the data is being written by external apps.
> To avoid this issue, stats autogather will be disabled on external tables 
> when loading/inserting into a table with existing data, if 
> HIVE_DISABLE_UNSAFE_EXTERNALTABLE_OPERATIONS is enabled. In this situation, 
> users should rely on explicitly calling ANALYZE TABLE on their external 
> tables to make sure the stats are kept up-to-date.
> Autogather of stats will still be allowed to occur on external tables in the 
> case of INSERT OVERWRITE or LOAD DATA OVERWRITE, since the existing data is 
> being removed and so the stats calculated on the inserted/loaded data should 
> be accurate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to