Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

Gopal Vijayaraghavan Thu, 19 Oct 2017 21:47:00 -0700

> . I didn't see data skew for that reducer. It has similar amount of 
> REDUCE_INPUT_RECORDS as other reducers.
…
> org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000 rows for 
> join key [4092813312923569]



The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is relevant.

 

The row containers being spilled to disk means that at least 1 key in the join 
has > 10000 values.

If you have Tez, this comes up when you run the SkewAnalyzer.

https://github.com/apache/tez/blob/master/tez-tools/analyzers/job-analyzer/src/main/java/org/apache/tez/analyzer/plugins/SkewAnalyzer.java#L41

 

Cheers,

Gopal

Re: In reduce task,i have a join operation ,and i found "org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1" cast much long

Reply via email to