[ https://issues.apache.org/jira/browse/HIVE-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-18851: ------------------------------------ Description: HIVE-18571 started as a couple small fixes for MM tables, but ends up making stats for ACID tables work better in general, but not rigorously and not for all cases. This is a follow-up JIRA to implement stats for ACID properly (potentially also with ACID semantics similar to those of queries, but that could be another follow-up - for now, at least they should be based on the correct set of files). Overall I've discovered that Hive stats code is spread all over in random places in code base and is brittle and inconsistent, esp. for any complex scenario like ACID tables. So, instead of making ad-hoc fixes everywhere, I think at the minimum it should be moved to a single spot (so that e.g. BasicStatsTask, BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same code with the same logic) and made valid for ACID. was: HIVE-18571 that started as a couple small fixes for MM tables, but ends up making stats for ACID tables work better in general, but not rigorously and not for all cases. This is a follow-up JIRA to implement stats for ACID properly (potentially also with ACID semantics similar to those of queries, but that could be another follow-up - for now, at least they should be based on the correct set of files). Overall I've discovered that Hive stats code is spread all over in random places in code base and is brittle and inconsistent, esp. for any complex scenario like ACID tables. So, instead of making ad-hoc fixes everywhere, I think at the minimum it should be moved to a single spot (so that e.g. BasicStatsTask, BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same code with the same logic) and made valid for ACID. > make Hive basic stats valid for ACID; clean up and refactor the code > -------------------------------------------------------------------- > > Key: HIVE-18851 > URL: https://issues.apache.org/jira/browse/HIVE-18851 > Project: Hive > Issue Type: Bug > Reporter: Sergey Shelukhin > Priority: Major > Labels: ACID > > HIVE-18571 started as a couple small fixes for MM tables, but ends up making > stats for ACID tables work better in general, but not rigorously and not for > all cases. > This is a follow-up JIRA to implement stats for ACID properly (potentially > also with ACID semantics similar to those of queries, but that could be > another follow-up - for now, at least they should be based on the correct set > of files). > Overall I've discovered that Hive stats code is spread all over in random > places in code base and is brittle and inconsistent, esp. for any complex > scenario like ACID tables. > So, instead of making ad-hoc fixes everywhere, I think at the minimum it > should be moved to a single spot (so that e.g. BasicStatsTask, > BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same > code with the same logic) and made valid for ACID. -- This message was sent by Atlassian JIRA (v7.6.3#76005)