[ https://issues.apache.org/jira/browse/HIVE-16341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950138#comment-15950138 ]
Gopal V commented on HIVE-16341: -------------------------------- [~jdere]: I think the existing codepath assumes {{tez.task.generate.counters.per.io=false}} Fixing this correctly requires per-io counters to be always enabled (check that and if/else the counter checks?). > Tez Task Execution Summary has incorrect input record counts on some operators > ------------------------------------------------------------------------------ > > Key: HIVE-16341 > URL: https://issues.apache.org/jira/browse/HIVE-16341 > Project: Hive > Issue Type: Bug > Components: Tez > Reporter: Jason Dere > Assignee: Jason Dere > Attachments: HIVE-16341.1.patch > > > {noformat} > Task Execution Summary > -------------------------------------------------------------------------------------------------------------------------------- > VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION(ms) > CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS > -------------------------------------------------------------------------------------------------------------------------------- > Map 1 167 0 0 17640.00 > 2,109,200 23,068 150,000,004 11,995,136 > Map 11 5 0 0 10559.00 > 71,960 633 4,023,690 799,900 > Map 13 1 0 0 2244.00 > 6,090 29 25 3 > Map 3 1 0 0 2849.00 > 7,080 99 25 3 > Map 5 271 0 0 55834.00 > 12,934,890 358,376 1,500,000,001 1,500,000,161 > Map 7 241 0 0 91243.00 > 5,020,860 71,182 1,827,250,341 652,413,443 > Reducer 10 1 0 0 1010.00 > 1,900 0 4 0 > Reducer 12 1 0 0 3854.00 > 1,320 0 799,900 1 > Reducer 14 1 0 0 1420.00 > 3,790 45 3 1 > Reducer 2 1 0 0 9720.00 > 6,220 122 11,995,136 1 > Reducer 4 1 0 0 810.00 > 2,100 105 3 1 > Reducer 6 1 0 0 24863.00 > 3,260 5 1,500,000,161 1 > Reducer 8 412 0 0 88215.00 > 17,106,440 184,524 2,165,208,640 1,864 > Reducer 9 2 0 0 29752.00 > 3,980 0 1,864 4 > -------------------------------------------------------------------------------------------------------------------- > {noformat} > Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS > look incorrect for the reducers that are responsible for aggregating the > min/max/bloomfilter (Reducers 12, 14, 2, 6). For example Reducer 2 shows 12M > input records. However looking at the task logs for Reducer 2, there were > only 167 input records. > It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer > 8), but the total output rows for Map 1 (rather than just the rows going to > each specific vertex) is being counted in the input rows for both Reducer 2 > and Reducer 8. -- This message was sent by Atlassian JIRA (v6.3.15#6346)