Ashutosh Chauhan created HIVE-6500: -------------------------------------- Summary: Stats collection via filesystem Key: HIVE-6500 URL: https://issues.apache.org/jira/browse/HIVE-6500 Project: Hive Issue Type: New Feature Components: Statistics Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan
Recently, support for stats gathering via counter was [added | https://issues.apache.org/jira/browse/HIVE-4632] Although, its useful it has following issues: * [Length of counter group name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L340] * [Length of counter name is limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L337] *[Number of distinct counter groups are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L343] *[Number of distinct counters are limited | https://github.com/apache/hadoop-common/blob/branch-2.3/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/MRJobConfig.java?source=c#L334] Although, these limits are configurable, but setting them to higher value implies increased memory load on AM and job history server. Now, whether these limits makes sense or not is [debatable | https://issues.apache.org/jira/browse/MAPREDUCE-5680] it is desirable that Hive doesn't make use of counters features of framework so that it we can evolve this feature without relying on support from framework. Filesystem based counter collection is a step in that direction. -- This message was sent by Atlassian JIRA (v6.1.5#6160)