> . I didn't see data skew for that reducer. It has similar amount of 
> REDUCE_INPUT_RECORDS as other reducers.
…
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for 
> join key [4092813312923569]


The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is relevant.

 

The row containers being spilled to disk means that at least 1 key in the join 
has > 10000 values.

If you have Tez, this comes up when you run the SkewAnalyzer.

https://github.com/apache/tez/blob/master/tez-tools/analyzers/job-analyzer/src/main/java/org/apache/tez/analyzer/plugins/SkewAnalyzer.java#L41

 

Cheers,

Gopal

Reply via email to