[jira] [Commented] (HIVE-12010) Tests should use FileSystem based stats collection mechanism

Pengcheng Xiong (JIRA) Tue, 13 Oct 2015 11:20:45 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-12010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14955400#comment-14955400
 ]


Pengcheng Xiong commented on HIVE-12010:
----------------------------------------

I have double checked all the q file changes. The main change is the stats for 
a table loaded from src table, which has 500 rows. Previously, the wrong stats 
showed a number of rows less than 500 (e.g., 55). LGTM +1

> Tests should use FileSystem based stats collection mechanism
> ------------------------------------------------------------
>
>                 Key: HIVE-12010
>                 URL: https://issues.apache.org/jira/browse/HIVE-12010
>             Project: Hive
>          Issue Type: Task
>          Components: Statistics
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-12010.1.patch, HIVE-12010.2.patch, 
> HIVE-12010.3.patch, HIVE-12010.4.patch, HIVE-12010.patch
>
>
> Although fs based collection mechanism is default for last few releases, 
> tests still use jdbc for stats collection. The main advantage of fs based 
> collection over jdbc based one is the scalability. In jdbc case, a single 
> database (normally co-located with the metastore relational database) is used 
> to handle all the stats collected by all the tasks. This single database is 
> responsible to maintain the consistency for the stats, which will become a 
> bottleneck and face scalability issue when the number of tasks is huge. In fs 
> case, each task is writing stats into hdfs which does not have scalability 
> issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-12010) Tests should use FileSystem based stats collection mechanism

Reply via email to