[ https://issues.apache.org/jira/browse/HIVE-1362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470686#comment-13470686 ]
Shreepadma Venugopalan commented on HIVE-1362: ---------------------------------------------- I assume when you say row level statistics you are referring to table statistics. Today, table statistics is stored as part of the table_params. table_params table gets mapped to the TTable object in memory and it looks like the existing APIs sufficed. We want to have a dedicated Thrift API for column stats for the following reasons, 1. Column statistics is a property of the column and not the table and hence doesn't belong with the table_params. Furthermore, we have seen customers with tables that are 100s-1000s of columns wide. Storing this information as a table_param is going to bloat, and it will also make the output of DESCRIBE EXTENDED unreadable. 2. We want column statistics to be a first class metadata. In order to do so, we have to provide dedicated Thrift APIs to query and update it. We want the Thrift API to be self-documenting, i.e. if someone tells you that metastore supports column stats, you should be able to look at the Thrift IDL and figure out which method you need to use to store/retrieve column stats. Right now a lot of the API doesn't satisfy that goal since many methods are overloaded, and other features are implemented by adding new key/value properties to different catalog objects that aren't easy to document via the thrift API 3. Additionally storing column statistics as a key/value pair in the table_params table is not space efficient. We need to repeat the keys for each one of the columns in the table for which statistics is gathered. Furthermore, by storing column stats in the table_params table we would de-normalize the schema completely and incur a performance penalty performing self-joins, though not necessarily in the metasote db, to retrieve the statistics associated with a column. > column level statistics > ----------------------- > > Key: HIVE-1362 > URL: https://issues.apache.org/jira/browse/HIVE-1362 > Project: Hive > Issue Type: Sub-task > Components: Statistics > Reporter: Ning Zhang > Assignee: Shreepadma Venugopalan > Attachments: HIVE-1362.1.patch.txt, HIVE-1362.2.patch.txt, > HIVE-1362.3.patch.txt, HIVE-1362.4.patch.txt, > HIVE-1362-gen_thrift.1.patch.txt, HIVE-1362-gen_thrift.2.patch.txt, > HIVE-1362-gen_thrift.3.patch.txt, HIVE-1362-gen_thrift.4.patch.txt > > -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira