[ https://issues.apache.org/jira/browse/HIVE-19489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16471762#comment-16471762 ]
Zoltan Haindrich commented on HIVE-19489: ----------------------------------------- I'm not sure if we should disable it globally; but there could be an option to do that - I think it would be probably be usefull to have a table level option to prevent it from happening on specific tables. Without statistics the planner will start operating in blind: I think fs level stats are not really good; auto gathering may also collect column stats which could be very usefull during estimations. afaik auto gathering should not happen during LOAD DATA statements cc: [~ashutoshc] > Disable stats autogather for external tables > -------------------------------------------- > > Key: HIVE-19489 > URL: https://issues.apache.org/jira/browse/HIVE-19489 > Project: Hive > Issue Type: Sub-task > Components: Statistics > Reporter: Jason Dere > Assignee: Jason Dere > Priority: Major > > Hive auto-gather of table statistics can result in incorrect generation of > stats (and the stats being marked as accurate) in the case of external tables > where the data is being written by external apps. > To avoid this issue, stats autogather will be disabled on external tables > when loading/inserting into a table with existing data, if > HIVE_DISABLE_UNSAFE_EXTERNALTABLE_OPERATIONS is enabled. In this situation, > users should rely on explicitly calling ANALYZE TABLE on their external > tables to make sure the stats are kept up-to-date. > Autogather of stats will still be allowed to occur on external tables in the > case of INSERT OVERWRITE or LOAD DATA OVERWRITE, since the existing data is > being removed and so the stats calculated on the inserted/loaded data should > be accurate. -- This message was sent by Atlassian JIRA (v7.6.3#76005)