[jira] [Updated] (HIVE-16341) Tez Task Execution Summary has incorrect input record counts on some operators

Jason Dere (JIRA) Fri, 31 Mar 2017 12:12:16 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-16341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jason Dere updated HIVE-16341:
------------------------------
    Attachment: HIVE-16341.2.patch

Tried checking for tez.task.generate.counters.per.io, but this did not seem to 
be visible from the hiveConf, maybe this was set in the tez-conf on HDFS.
I've changed the patch to use the Tez counters if they exist, and to fall back 
to the old method using the Hive counters if it can't find them.

> Tez Task Execution Summary has incorrect input record counts on some operators
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-16341
>                 URL: https://issues.apache.org/jira/browse/HIVE-16341
>             Project: Hive
>          Issue Type: Bug
>          Components: Tez
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-16341.1.patch, HIVE-16341.2.patch
>
>
> {noformat}
> Task Execution Summary
> --------------------------------------------------------------------------------------------------------------------------------
>   VERTICES  TOTAL_TASKS  FAILED_ATTEMPTS  KILLED_TASKS   DURATION(ms)  
> CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS  OUTPUT_RECORDS
> --------------------------------------------------------------------------------------------------------------------------------
>      Map 1          167                0             0       17640.00     
> 2,109,200       23,068    150,000,004      11,995,136
>     Map 11            5                0             0       10559.00        
> 71,960          633      4,023,690         799,900
>     Map 13            1                0             0        2244.00         
> 6,090           29             25               3
>      Map 3            1                0             0        2849.00         
> 7,080           99             25               3
>      Map 5          271                0             0       55834.00    
> 12,934,890      358,376  1,500,000,001   1,500,000,161
>      Map 7          241                0             0       91243.00     
> 5,020,860       71,182  1,827,250,341     652,413,443
> Reducer 10            1                0             0        1010.00         
> 1,900            0              4               0
> Reducer 12            1                0             0        3854.00         
> 1,320            0        799,900               1
> Reducer 14            1                0             0        1420.00         
> 3,790           45              3               1
>  Reducer 2            1                0             0        9720.00         
> 6,220          122     11,995,136               1
>  Reducer 4            1                0             0         810.00         
> 2,100          105              3               1
>  Reducer 6            1                0             0       24863.00         
> 3,260            5  1,500,000,161               1
>  Reducer 8          412                0             0       88215.00    
> 17,106,440      184,524  2,165,208,640           1,864
>  Reducer 9            2                0             0       29752.00         
> 3,980            0          1,864               4
> --------------------------------------------------------------------------------------------------------------------
> {noformat}
> Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS 
> look incorrect for the reducers that are responsible for aggregating the 
> min/max/bloomfilter (Reducers 12, 14, 2, 6). For example Reducer 2 shows 12M 
> input records. However looking at the task logs for Reducer 2, there were 
> only 167 input records.
> It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer 
> 8), but the total output rows for Map 1 (rather than just the rows going to 
> each specific vertex) is being counted in the input rows for both Reducer 2 
> and Reducer 8.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Updated] (HIVE-16341) Tez Task Execution Summary has incorrect input record counts on some operators

Reply via email to