[jira] [Created] (HIVE-6500) Stats collection via filesystem

Ashutosh Chauhan created HIVE-6500:
--------------------------------------

             Summary: Stats collection via filesystem
                 Key: HIVE-6500
                 URL: https://issues.apache.org/jira/browse/HIVE-6500
             Project: Hive
          Issue Type: New Feature
          Components: Statistics
            Reporter: Ashutosh Chauhan
            Assignee: Ashutosh Chauhan



Recently, support for stats gathering via counter was [added | 
https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has 
following issues:
* [Length of counter group name is limited | 
https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340]
* [Length of counter name is limited | 
https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337]
*[Number of distinct counter groups are limited | 
https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343]
*[Number of distinct counters are limited | 
https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334]
Although, these limits are configurable, but setting them to higher value 
implies increased memory load on AM and job history server.
Now, whether these limits makes sense or not is [debatable | 
https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive 
doesn't make use of counters features of framework so that it we can evolve 
this feature without relying on support from framework. Filesystem based 
counter collection is a step in that direction.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Created] (HIVE-6500) Stats collection via filesystem

Reply via email to