Well, that is my problem here: i checked the logs on resourcemanager and nodemanagers and saw nothing suspicious, but these logs are too verbose. Could it be the data or avro records which have issues ?
-- Alexandre Fouche Lead operations engineer, cloud architect http://www.cleverscale.com | @cleverscale Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday 7 November 2012 at 12:36, Jan DolinĂ¡r wrote: > This usually happens when some task fail, their progress is then not > counted, hence the 'restart'. Check your task logs for failures. > > Jan > > > On Wed, Nov 7, 2012 at 12:30 PM, Alexandre Fouche > <alexandre.fou...@cleverscale.com (mailto:alexandre.fou...@cleverscale.com)> > wrote: > > I have a Yarn MR (with two ec2 instances to mapreduce) job on a dataset of > > approximately a thousand avro records, and the map phase is behaving > > erratically. See the progress below > > > > What is going on there ? > > > > hive> select * from nikon where qs_cs_s_aid='VIEW' limit 10; > > > > Total MapReduce jobs = 1 > > Launching Job 1 out of 1 > > Number of reduce tasks is set to 0 since there's no reduce > > operator > > Starting Job = job_1352281315350_0020, Tracking URL = > > http://blabla.ec2.internal:8088/proxy/application_1352281315350_0020/ > > Kill Command = /usr/lib/hadoop/bin/hadoop job > > -Dmapred.job.tracker=blabla.com:8032 (http://blabla.com:8032) -kill > > job_1352281315350_0020 > > Hadoop job information for Stage-1: number of mappers: 4; number > > of reducers: 0 > > > > 2012-11-07 11:14:40,976 Stage-1 map = 0%, reduce = 0% > > 2012-11-07 11:15:06,136 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 10.38 sec > > 2012-11-07 11:15:07,253 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 12.18 sec > > 2012-11-07 11:15:08,371 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 12.18 sec > > 2012-11-07 11:15:09,491 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 12.18 sec > > 2012-11-07 11:15:10,643 Stage-1 map = 2%, reduce = 0%, > > Cumulative CPU 15.42 sec > > (...) > > 2012-11-07 11:15:35,441 Stage-1 map = 28%, reduce = 0%, > > Cumulative CPU 37.77 sec > > 2012-11-07 11:15:36,486 Stage-1 map = 28%, reduce = 0%, > > Cumulative CPU 37.77 sec > > > > here restart at 16% ? > > > > 2012-11-07 11:15:37,692 Stage-1 map = 16%, reduce = 0%, > > Cumulative CPU 21.15 sec > > 2012-11-07 11:15:38,815 Stage-1 map = 16%, reduce = 0%, > > Cumulative CPU 21.15 sec > > 2012-11-07 11:15:39,865 Stage-1 map = 16%, reduce = 0%, > > Cumulative CPU 21.15 sec > > 2012-11-07 11:15:41,064 Stage-1 map = 18%, reduce = 0%, > > Cumulative CPU 22.4 sec > > 2012-11-07 11:15:42,181 Stage-1 map = 18%, reduce = 0%, > > Cumulative CPU 22.4 sec > > 2012-11-07 11:15:43,299 Stage-1 map = 18%, reduce = 0%, > > Cumulative CPU 22.4 sec > > > > here restart at 0% ? > > > > 2012-11-07 11:15:44,418 Stage-1 map = 0%, reduce = 0% > > 2012-11-07 11:16:02,076 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 6.86 sec > > 2012-11-07 11:16:03,193 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 6.86 sec > > 2012-11-07 11:16:04,259 Stage-1 map = 2%, reduce = 0%, > > Cumulative CPU 8.45 sec > > (...) > > 2012-11-07 11:16:31,291 Stage-1 map = 22%, reduce = 0%, > > Cumulative CPU 35.34 sec > > 2012-11-07 11:16:32,414 Stage-1 map = 26%, reduce = 0%, > > Cumulative CPU 37.93 sec > > > > here restart at 11% ? > > > > 2012-11-07 11:16:33,459 Stage-1 map = 11%, reduce = 0%, > > Cumulative CPU 19.53 sec > > 2012-11-07 11:16:34,507 Stage-1 map = 11%, reduce = 0%, > > Cumulative CPU 19.53 sec > > 2012-11-07 11:16:35,731 Stage-1 map = 13%, reduce = 0%, > > Cumulative CPU 21.47 sec > > (...) > > 2012-11-07 11:16:46,839 Stage-1 map = 17%, reduce = 0%, > > Cumulative CPU 24.14 sec > > > > here restart at 0% ? > > > > 2012-11-07 11:16:47,939 Stage-1 map = 0%, reduce = 0% > > 2012-11-07 11:16:56,653 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 7.54 sec > > 2012-11-07 11:16:57,814 Stage-1 map = 1%, reduce = 0%, > > Cumulative CPU 7.54 sec > > (...) > > > > Needless to say the job crashes after some time with an Error: > > java.io.IOException: java.io.IOException: > > java.lang.ArrayIndexOutOfBoundsException: -56 > > > > >