[ 
https://issues.apache.org/jira/browse/HIVE-18851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18851:
------------------------------------
    Description: 
HIVE-18571 that started as a couple small fixes for MM tables, but ends up 
making stats for ACID tables work better in general, but not rigorously and not 
for all cases.
This is a follow-up JIRA to implement stats for ACID properly (potentially also 
with ACID semantics similar to those of queries, but that could be another 
follow-up - for now, at least they should be based on the correct set of files).
Overall I've discovered that Hive stats code is spread all over in random 
places in code base and is brittle and inconsistent, esp. for any complex 
scenario like ACID tables. 
So, instead of making ad-hoc fixes everywhere, I think at the minimum it should 
be moved to a single spot (so that e.g. BasicStatsTask, BasicStatsTaskNoJob, 
metastore "quick" stats generation, etc all use the same code with the same 
logic) and made valid for ACID.

  was:
Based on HIVE-18571 that started as a couple small fixes for MM tables, but 
ends up making stats for ACID tables work better in general, but not rigorously 
and not for all cases.
Overall I've discovered that Hive stats code is spread all over in random 
places in code base and is brittle and inconsistent, esp. for any complex 
scenario like ACID tables. 
I think at the minimum it should be moved to a single spot (so that e.g. 
BasicStatsTask, BasicStatsTaskNoJob, metastore stats generation, etc all use 
the same code with the same logic) and made valid for ACID.


> make Hive basic stats valid for ACID; clean up and refactor the code
> --------------------------------------------------------------------
>
>                 Key: HIVE-18851
>                 URL: https://issues.apache.org/jira/browse/HIVE-18851
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> HIVE-18571 that started as a couple small fixes for MM tables, but ends up 
> making stats for ACID tables work better in general, but not rigorously and 
> not for all cases.
> This is a follow-up JIRA to implement stats for ACID properly (potentially 
> also with ACID semantics similar to those of queries, but that could be 
> another follow-up - for now, at least they should be based on the correct set 
> of files).
> Overall I've discovered that Hive stats code is spread all over in random 
> places in code base and is brittle and inconsistent, esp. for any complex 
> scenario like ACID tables. 
> So, instead of making ad-hoc fixes everywhere, I think at the minimum it 
> should be moved to a single spot (so that e.g. BasicStatsTask, 
> BasicStatsTaskNoJob, metastore "quick" stats generation, etc all use the same 
> code with the same logic) and made valid for ACID.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to