Re: join operation fails on big data set

Mua Ban Fri, 12 Apr 2013 10:28:01 -0700

Thank you very much for your reply.

Below is the stack log in the pig_****.log file


Can you please give me some suggestions?

-Mua
------------------
Backend error message
---------------------
Task attempt_201304081613_0048_r_000001_0 failed to report status for 601
seconds. Killing!

Pig Stack Trace
---------------
ERROR 2997: Unable to recreate exception from backed error: Task
attempt_201304081613_0048_r_000001_0 failed to report status for 601
seconds. Killing!

org.apache.pig.backend.executionengine.ExecException: ERROR 2997: Unable to
recreate exception from backed error: Task
attempt_201304081613_0048_r_000001_0 failed to report status for 601
seconds. Killing!
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:217)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:152)
        at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:383)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1270)
        at
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1255)
        at org.apache.pig.PigServer.execute(PigServer.java:1245)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:362)
        at
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:132)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:193)
        at
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
        at org.apache.pig.Main.run(Main.java:555)
        at org.apache.pig.Main.main(Main.java:111)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
        at java.lang.reflect.Method.invoke(Method.java:611)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
================================================================================





On Fri, Apr 12, 2013 at 11:29 AM, Cheolsoo Park <[email protected]>wrote:

> Did you look at task logs to see why those tasks failed? Since it's a
> back-end error, the console output doesn't tell you much. Task logs should
> have a stack trace that shows why it failed, and you can go from there.
>
>
>
> On Fri, Apr 12, 2013 at 8:18 AM, Mua Ban <[email protected]> wrote:
>
> > Hi,
> >
> > I am very new to PIG/Hadoop, I just started writing my first PIG script a
> > couple days ago. I ran into this problem.
> >
> > My cluster has 9 nodes. I have to join two data sets big and small, each
> is
> > collected for 4 weeks. I first take two subsets of my data set (which is
> > for the first week of data), let's call them B1 and S1 for big and small
> > data sets of the first week. The entire data sets of 4 weeks is B4 and
> S4.
> >
> > I ran my script on my cluster to join B1 and S1 and everything is fine. I
> > got my joined data. However, when I ran my script to join B4 and S4, the
> > script failed. B4 is 39GB, S4 is 300MB. B4 is skewed, some id appears
> more
> > frequently than others. I tried both 'using skewed' and 'using
> replicated'
> > modes for the join operation (by appending them to the end of the below
> > join clause), they both fail.
> >
> > Here is my script and i think it is very simple:
> >
> > *big = load 'bigdir/' using PigStorage(',') as (id:chararray,
> > data:chararray);*
> > *small = load 'smalldir/' using PigStorage(',') as
> > (t1:double,t2:double,data:chararray,id:chararray);
> > *
> > *J = JOIN big by id LEFT OUTER, small by id;
> > *
> > *store J into 'outputdir' using PigStorage(',');
> > *
> >
> > On the web ui of the tracker, I see that the job has 40 reducers (I guess
> > since the total data is about 40GB, and each 1GB will need one reducer by
> > default of PIG and hadoop setting, so this is normal). If I use 'parallel
> > 80' in the join operation above, then I see 80 reducers, and the join
> > operation still failed.
> >
> > I checked file  mapred-default.xml and found this:
> > <name>mapred.reduce.tasks</name>
> >   <value>1</value>
> >
> > If I set the value of parallel in join operation, it should overwrite
> this,
> > right?
> >
> >
> > On the tracker GUI, I see that for different runs, the number of
> completed
> > reducers changes from 4 to 10 (out of 40 total reducers). The tracker GUI
> > shows the reason for the failed reducers: "Task
> > attempt_201304081613_0046_r_000006_0 failed to report status for 600
> > seconds. Killing!"
> >
> > *Could you please help?*
> > Thank you very much,
> > -Mua
> >
> >
> >
> --------------------------------------------------------------------------------------------------------------
> > Here is the error report from the console screen where I ran this script:
> >
> > job_201304081613_0032   616     0       230     12      32      0   0
> > 0       big     MAP_ONLY
> > job_201304081613_0033   705     1       21      6       6       234 2
> > 34      234             SAMPLER
> >
> > Failed Jobs:
> > JobId   Alias   Feature Message Outputs
> > job_201304081613_0034   small   SKEWED_JOIN     Message: Job failed!
> > Error - # of failed Reduce Tasks exceeded allowed limit. FailedCount: 1.
> > LastFailedTask: task_201304081613_0034_r_000012
> >
> > Input(s):
> > Successfully read 364285458 records (39528533645 bytes) from:
> > "hdfs://d0521b01:24990/user/abc/big/"
> > Failed to read data from "hdfs://d0521b01:24990/user/abc/small/"
> >
> > Output(s):
> >
> > Counters:
> > Total records written : 0
> > Total bytes written : 0
> > Spillable Memory Manager spill count : 0
> > Total bags proactively spilled: 0
> > Total records proactively spilled: 0
> >
> > Job DAG:
> > job_201304081613_0032   ->      job_201304081613_0033,
> > job_201304081613_0033   ->      job_201304081613_0034,
> > job_201304081613_0034   ->      null,
> > null
> >
> >
> > 2013-04-10 20:11:23,815 [main] WARN
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Encountered Warning
> > REDUCER_COUNT_LOW 1 time(s).
> > 2013-04-10 20:11:23,815 [main] INFO
> >
> >
>  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Some jobs have faile
> > d! Stop running all dependent jobs
> > 2013-04-10 20:11:23,815 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser
> > - ERROR 2997: Encountered IOException. java.io.IOException: Er
> > ror Recovery for block blk_312487981794332936_26563 failed  because
> > recovery from primary datanode 10.6.25.31:54563 failed 6 times.  Pipel
> > ine was 10.6.25.31:54563. Aborting...
> > Details at logfile: /homes/abc/pig-flatten/scripts/pig_1365627648226.log
> > 2013-04-10 20:11:23,818 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser
> > - ERROR 2244: Job failed, hadoop does not return any error mes
> > sage
> > Details at logfile: /homes/abc/pig-flatten/scripts/pig_1365627648226.log
> >
>

Re: join operation fails on big data set

Reply via email to