Re: join operation fails on big data set

Cheolsoo Park Fri, 12 Apr 2013 08:30:04 -0700

Did you look at task logs to see why those tasks failed? Since it's a
back-end error, the console output doesn't tell you much. Task logs should
have a stack trace that shows why it failed, and you can go from there.




On Fri, Apr 12, 2013 at 8:18 AM, Mua Ban <[email protected]> wrote:

> Hi,
>
> I am very new to PIG/Hadoop, I just started writing my first PIG script a
> couple days ago. I ran into this problem.
>
> My cluster has 9 nodes. I have to join two data sets big and small, each is
> collected for 4 weeks. I first take two subsets of my data set (which is
> for the first week of data), let's call them B1 and S1 for big and small
> data sets of the first week. The entire data sets of 4 weeks is B4 and S4.
>
> I ran my script on my cluster to join B1 and S1 and everything is fine. I
> got my joined data. However, when I ran my script to join B4 and S4, the
> script failed. B4 is 39GB, S4 is 300MB. B4 is skewed, some id appears more
> frequently than others. I tried both 'using skewed' and 'using replicated'
> modes for the join operation (by appending them to the end of the below
> join clause), they both fail.
>
> Here is my script and i think it is very simple:
>
> *big = load 'bigdir/' using PigStorage(',') as (id:chararray,
> data:chararray);*
> *small = load 'smalldir/' using PigStorage(',') as
> (t1:double,t2:double,data:chararray,id:chararray);
> *
> *J = JOIN big by id LEFT OUTER, small by id;
> *
> *store J into 'outputdir' using PigStorage(',');
> *
>
> On the web ui of the tracker, I see that the job has 40 reducers (I guess
> since the total data is about 40GB, and each 1GB will need one reducer by
> default of PIG and hadoop setting, so this is normal). If I use 'parallel
> 80' in the join operation above, then I see 80 reducers, and the join
> operation still failed.
>
> I checked file  mapred-default.xml and found this:
> <name>mapred.reduce.tasks</name>
>   <value>1</value>
>
> If I set the value of parallel in join operation, it should overwrite this,
> right?
>
>
> On the tracker GUI, I see that for different runs, the number of completed
> reducers changes from 4 to 10 (out of 40 total reducers). The tracker GUI
> shows the reason for the failed reducers: "Task
> attempt_201304081613_0046_r_000006_0 failed to report status for 600
> seconds. Killing!"
>
> *Could you please help?*
> Thank you very much,
> -Mua
>
>
> --------------------------------------------------------------------------------------------------------------
> Here is the error report from the console screen where I ran this script:
>
> job_201304081613_0032   616     0       230     12      32      0   0
> 0       big     MAP_ONLY
> job_201304081613_0033   705     1       21      6       6       234 2
> 34      234             SAMPLER
>
> Failed Jobs:
> JobId   Alias   Feature Message Outputs
> job_201304081613_0034   small   SKEWED_JOIN     Message: Job failed!
> Error - # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1.
> LastFailedTask: task_201304081613_0034_r_000012
>
> Input(s):
> Successfully read 364285458 records (39528533645 bytes) from:
> "hdfs://d0521b01:24990/user/abc/big/"
> Failed to read data from "hdfs://d0521b01:24990/user/abc/small/"
>
> Output(s):
>
> Counters:
> Total records written : 0
> Total bytes written : 0
> Spillable Memory Manager spill count : 0
> Total bags proactively spilled: 0
> Total records proactively spilled: 0
>
> Job DAG:
> job_201304081613_0032   ->      job_201304081613_0033,
> job_201304081613_0033   ->      job_201304081613_0034,
> job_201304081613_0034   ->      null,
> null
>
>
> 2013-04-10 20:11:23,815 [main] WARN
>
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Encountered Warning
> REDUCER_COUNT_LOW 1 time(s).
> 2013-04-10 20:11:23,815 [main] INFO
>
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Some jobs have faile
> d! Stop running all dependent jobs
> 2013-04-10 20:11:23,815 [main] ERROR org.apache.pig.tools.grunt.GruntParser
> - ERROR 2997: Encountered IOException. java.io.IOException: Er
> ror Recovery for block blk_312487981794332936_26563 failed  because
> recovery from primary datanode 10.6.25.31:54563 failed 6 times.  Pipel
> ine was 10.6.25.31:54563. Aborting...
> Details at logfile: /homes/abc/pig-flatten/scripts/pig_1365627648226.log
> 2013-04-10 20:11:23,818 [main] ERROR org.apache.pig.tools.grunt.GruntParser
> - ERROR 2244: Job failed, hadoop does not return any error mes
> sage
> Details at logfile: /homes/abc/pig-flatten/scripts/pig_1365627648226.log
>

Re: join operation fails on big data set

Reply via email to