[ https://issues.apache.org/jira/browse/HIVE-16341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Dere updated HIVE-16341: ------------------------------ Attachment: HIVE-16341.2.patch Tried checking for tez.task.generate.counters.per.io, but this did not seem to be visible from the hiveConf, maybe this was set in the tez-conf on HDFS. I've changed the patch to use the Tez counters if they exist, and to fall back to the old method using the Hive counters if it can't find them. > Tez Task Execution Summary has incorrect input record counts on some operators > ------------------------------------------------------------------------------ > > Key: HIVE-16341 > URL: https://issues.apache.org/jira/browse/HIVE-16341 > Project: Hive > Issue Type: Bug > Components: Tez > Reporter: Jason Dere > Assignee: Jason Dere > Attachments: HIVE-16341.1.patch, HIVE-16341.2.patch > > > {noformat} > Task Execution Summary > -------------------------------------------------------------------------------------------------------------------------------- > VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION(ms) > CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS > -------------------------------------------------------------------------------------------------------------------------------- > Map 1 167 0 0 17640.00 > 2,109,200 23,068 150,000,004 11,995,136 > Map 11 5 0 0 10559.00 > 71,960 633 4,023,690 799,900 > Map 13 1 0 0 2244.00 > 6,090 29 25 3 > Map 3 1 0 0 2849.00 > 7,080 99 25 3 > Map 5 271 0 0 55834.00 > 12,934,890 358,376 1,500,000,001 1,500,000,161 > Map 7 241 0 0 91243.00 > 5,020,860 71,182 1,827,250,341 652,413,443 > Reducer 10 1 0 0 1010.00 > 1,900 0 4 0 > Reducer 12 1 0 0 3854.00 > 1,320 0 799,900 1 > Reducer 14 1 0 0 1420.00 > 3,790 45 3 1 > Reducer 2 1 0 0 9720.00 > 6,220 122 11,995,136 1 > Reducer 4 1 0 0 810.00 > 2,100 105 3 1 > Reducer 6 1 0 0 24863.00 > 3,260 5 1,500,000,161 1 > Reducer 8 412 0 0 88215.00 > 17,106,440 184,524 2,165,208,640 1,864 > Reducer 9 2 0 0 29752.00 > 3,980 0 1,864 4 > -------------------------------------------------------------------------------------------------------------------- > {noformat} > Seeing this on queries using runtime filtering. Noticed the INPUT_RECORDS > look incorrect for the reducers that are responsible for aggregating the > min/max/bloomfilter (Reducers 12, 14, 2, 6). For example Reducer 2 shows 12M > input records. However looking at the task logs for Reducer 2, there were > only 167 input records. > It looks like Map 1 has 2 different output vertices (Reducer 2 and Reducer > 8), but the total output rows for Map 1 (rather than just the rows going to > each specific vertex) is being counted in the input rows for both Reducer 2 > and Reducer 8. -- This message was sent by Atlassian JIRA (v6.3.15#6346)