> . I didn't see data skew for that reducer. It has similar amount of > REDUCE_INPUT_RECORDS as other reducers. … > org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for > join key [4092813312923569]
The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is relevant. The row containers being spilled to disk means that at least 1 key in the join has > 10000 values. If you have Tez, this comes up when you run the SkewAnalyzer. https://github.com/apache/tez/blob/master/tez-tools/analyzers/job-analyzer/src/main/java/org/apache/tez/analyzer/plugins/SkewAnalyzer.java#L41 Cheers, Gopal