[ https://issues.apache.org/jira/browse/HIVE-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955400#comment-14955400 ]
Pengcheng Xiong commented on HIVE-12010: ---------------------------------------- I have double checked all the q file changes. The main change is the stats for a table loaded from src table, which has 500 rows. Previously, the wrong stats showed a number of rows less than 500 (e.g., 55). LGTM +1 > Tests should use FileSystem based stats collection mechanism > ------------------------------------------------------------ > > Key: HIVE-12010 > URL: https://issues.apache.org/jira/browse/HIVE-12010 > Project: Hive > Issue Type: Task > Components: Statistics > Reporter: Ashutosh Chauhan > Assignee: Ashutosh Chauhan > Attachments: HIVE-12010.1.patch, HIVE-12010.2.patch, > HIVE-12010.3.patch, HIVE-12010.4.patch, HIVE-12010.patch > > > Although fs based collection mechanism is default for last few releases, > tests still use jdbc for stats collection. The main advantage of fs based > collection over jdbc based one is the scalability. In jdbc case, a single > database (normally co-located with the metastore relational database) is used > to handle all the stats collected by all the tasks. This single database is > responsible to maintain the consistency for the stats, which will become a > bottleneck and face scalability issue when the number of tasks is huge. In fs > case, each task is writing stats into hdfs which does not have scalability > issue. -- This message was sent by Atlassian JIRA (v6.3.4#6332)