[ https://issues.apache.org/jira/browse/HIVE-6157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13876140#comment-13876140 ]
Shreepadma Venugopalan commented on HIVE-6157: ---------------------------------------------- Currently, the API fetches statistics for a given column. hive.stats.fetch.column.stats fetches stats for all columns for all partitions in all tables. Bad idea. HIVE-4301 was filed to support a bulk fetch API so that stats for all columns for all partitions in multiple tables can be fetched with a single call. Feel free to pick up HIVE-4301. > Fetching column stats slower than the 101 during rush hour > ---------------------------------------------------------- > > Key: HIVE-6157 > URL: https://issues.apache.org/jira/browse/HIVE-6157 > Project: Hive > Issue Type: Bug > Affects Versions: 0.13.0 > Reporter: Gunther Hagleitner > Assignee: Sergey Shelukhin > Attachments: HIVE-6157.prelim.patch > > > "hive.stats.fetch.column.stats" controls whether the column stats for a table > are fetched during explain (in Tez: during query planning). On my setup (1 > table 4000 partitions, 24 columns) the time spent in semantic analyze goes > from ~1 second to ~66 seconds when turning the flag on. 65 seconds spent > fetching column stats... > The reason is probably that the APIs force you to make separate metastore > calls for each column in each partition. That's probably the first thing that > has to change. The question is if in addition to that we need to cache this > in the client or store the stats as a single blob in the database to further > cut down on the time. However, the way it stands right now column stats seem > unusable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)