no, it's plain text file with \t delimited. And I'm expecting one mapper per
file, because I have 175 files, and I got 189 map tasks from what I can see
from the web UI. My issue is that since I have 189 map tasks waiting, why
hadoop is just using 2 of my 10 map slots, and I assume that all map tasks
should be independent.


On Tue, Apr 21, 2009 at 2:23 PM, Miles Osborne <[email protected]> wrote:

> is your input data compressed?  if so then you will get one mapper per file
>
> Miles
>
> 2009/4/21 javateck javateck <[email protected]>:
> > Hi Koji,
> >
> > Thanks for helping.
> >
> > I don't know why hadoop is just using 2 out of 10 map tasks slots.
> >
> > Sure, I just cut and paste the job tracker web UI, clearly I set the max
> > tasks to 10(which I can verify from hadoop-site.xml and from the
> individual
> > job configuration also), and I did have the first mapreduce running at 10
> > map tasks when I checked from UI, but all subsequent queries are running
> > with 2 map tasks. And I have almost 176 files with each input file around
> > 62~75MB.
> >
> >
> > *mapred.tasktracker.map.tasks.maximum* 10
> >
> >  *Kind*
> >
> > *% Complete*
> >
> > *Num Tasks*
> >
> > *Pending*
> >
> > *Running*
> >
> > *Complete*
> >
> > *Killed*
> >
> > *Failed/Killed*<
> http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025>
> >
> > *Task Attempts*
> >
> > *map*
> >
> > 28.04%
> >
> >
> >
> >   189
> >
> > 134
> >
> > 2
> >
> > 53
> >
> > 0
> >
> > 0 / 0
> >
> > *reduce*
> >
> > 0.00%
> >
> >
> >   1
> >
> > 1<
> http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending
> >
> >
> > 0
> >
> > 0
> >
> > 0
> >
> > 0 / 0
> >
> > *
> > *
> >
> > On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi <[email protected]
> >wrote:
> >
> >> It's probably a silly question, but you do have more than 2 mappers on
> >> your second job?
> >>
> >> If yes, I have no idea what's happening.
> >>
> >> Koji
> >>
> >> -----Original Message-----
> >> From: javateck javateck [mailto:[email protected]]
> >> Sent: Tuesday, April 21, 2009 1:38 PM
> >> To: [email protected]
> >> Subject: Re: mapred.tasktracker.map.tasks.maximum
> >>
> >> right, I set it in hadoop-site.xml before starting the whole hadoop
> >> processes, I have one job running fully utilizing the 10 map tasks, but
> >> subsequent queries are only using 2 of them, don't know why.
> >> I have enough RAM also, no paging out is happening, I'm running on
> >> 0.18.3.
> >> Right now I put all processes on one machine, namenode, datanode,
> >> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
> >>
> >>
> >> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi
> >> <[email protected]>wrote:
> >>
> >> > This is a cluster config and not a per job config.
> >> >
> >> > So this has to be set when the mapreduce cluster first comes up.
> >> >
> >> > Koji
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: javateck javateck [mailto:[email protected]]
> >> > Sent: Tuesday, April 21, 2009 1:20 PM
> >> > To: [email protected]
> >> > Subject: mapred.tasktracker.map.tasks.maximum
> >> >
> >> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run
> >> a
> >> > task, it's only using 2 out of 10, any way to know why it's only using
> >> > 2?
> >> > thanks
> >> >
> >>
> >
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>

Reply via email to