[ 
https://issues.apache.org/jira/browse/HIVE-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-5916:
-----------------------------------

    Attachment: HIVE-5916.4.patch

Patch is ready for review. 
[~navis] Major change in this patch is it changes the aggKey such that no 
aggregation is required on client side in StatsTask. This also shortens the key 
so that it won't run into limits. Now we just use dbName.TblName/p1=v1/ instead 
of long elaborate filesystem paths. This patch also lowers the number of unique 
counters required. Earlier they were num of Partitions * number of Tasks * 
number of Stats, now we just use num of Partitions * num of Tasks. 
This will conflict with HIVE-5936 in major fashion. Since, it has bunch of 
additional improvements, do you think it makes sense to get this one in first.

> No need to aggregate statistics collected via counter mechanism 
> ----------------------------------------------------------------
>
>                 Key: HIVE-5916
>                 URL: https://issues.apache.org/jira/browse/HIVE-5916
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.13.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>         Attachments: HIVE-5916.2.patch, HIVE-5916.3.patch, HIVE-5916.4.patch, 
> HIVE-5916.patch
>
>
> This results in unnecessary computations and waste of cluster resources which 
> is not required since aggregation of counter is anyway done by JobTracker.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to