Hi Gopal,

Thanks for your input! In my case I'm using MapReduce not Tez. I figured
I'd better be more specific so as to provide you more details.

For this job there are 298 maps and 74 reduces. All the maps completed real
fast within 1 minute, and 73 reduces completed in about 2 minutes.

Now there is only 1 reduce task running (forever). Here's a screenshot for
the job details: https://ibb.co/eBDj6R

I noticed one thing interesting: MAP_OUTPUT_RECORDS and
REDUCE_INPUT_RECORDS don't match for the whole job (99,073,863 vs.
98,105,913).

Here's a screenshot for the counters of the dangling reduce task:
https://ibb.co/dHgyY6

The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is 1. What does
it mean?

For comparison, here's a screenshot for the counters of a different reduce
task which completed within 1 minute. It also has the ratio of 1:
https://ibb.co/mzoHRR

Another comparison I did is for the tasklog. Below No. 1 is for the
dangling reduce task, and No. 2 and 3 are for two completed reduce tasks.
1. https://ibb.co/caKVfm
2. https://ibb.co/earJ0m
3. https://ibb.co/edQiY6

I don't understanding what the running reduce task is doing. Any other logs
that could be helpful?

Regards,
Daniel

On Thu, Oct 19, 2017 at 9:45 PM, Gopal Vijayaraghavan <gop...@apache.org>
wrote:

> > . I didn't see data skew for that reducer. It has similar amount of
> REDUCE_INPUT_RECORDS as other reducers.
> …
> > org.apache.hadoop.hive.ql.exec.CommonJoinOperator: table 0 has 8000
> rows for join key [4092813312923569]
>
>
> The ratio of REDUCE_INPUT_RECORDS and REDUCE_INPUT_GROUPS is what is
> relevant.
>
>
>
> The row containers being spilled to disk means that at least 1 key in the
> join has > 10000 values.
>
> If you have Tez, this comes up when you run the SkewAnalyzer.
>
> https://github.com/apache/tez/blob/master/tez-tools/
> analyzers/job-analyzer/src/main/java/org/apache/tez/
> analyzer/plugins/SkewAnalyzer.java#L41
>
>
>
> Cheers,
>
> Gopal
>

Reply via email to