no, it's plain text file with \t delimited. And I'm expecting one mapper per file, because I have 175 files, and I got 189 map tasks from what I can see from the web UI. My issue is that since I have 189 map tasks waiting, why hadoop is just using 2 of my 10 map slots, and I assume that all map tasks should be independent.
On Tue, Apr 21, 2009 at 2:23 PM, Miles Osborne <[email protected]> wrote: > is your input data compressed? if so then you will get one mapper per file > > Miles > > 2009/4/21 javateck javateck <[email protected]>: > > Hi Koji, > > > > Thanks for helping. > > > > I don't know why hadoop is just using 2 out of 10 map tasks slots. > > > > Sure, I just cut and paste the job tracker web UI, clearly I set the max > > tasks to 10(which I can verify from hadoop-site.xml and from the > individual > > job configuration also), and I did have the first mapreduce running at 10 > > map tasks when I checked from UI, but all subsequent queries are running > > with 2 map tasks. And I have almost 176 files with each input file around > > 62~75MB. > > > > > > *mapred.tasktracker.map.tasks.maximum* 10 > > > > *Kind* > > > > *% Complete* > > > > *Num Tasks* > > > > *Pending* > > > > *Running* > > > > *Complete* > > > > *Killed* > > > > *Failed/Killed*< > http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025> > > > > *Task Attempts* > > > > *map* > > > > 28.04% > > > > > > > > 189 > > > > 134 > > > > 2 > > > > 53 > > > > 0 > > > > 0 / 0 > > > > *reduce* > > > > 0.00% > > > > > > 1 > > > > 1< > http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending > > > > > > 0 > > > > 0 > > > > 0 > > > > 0 / 0 > > > > * > > * > > > > On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi <[email protected] > >wrote: > > > >> It's probably a silly question, but you do have more than 2 mappers on > >> your second job? > >> > >> If yes, I have no idea what's happening. > >> > >> Koji > >> > >> -----Original Message----- > >> From: javateck javateck [mailto:[email protected]] > >> Sent: Tuesday, April 21, 2009 1:38 PM > >> To: [email protected] > >> Subject: Re: mapred.tasktracker.map.tasks.maximum > >> > >> right, I set it in hadoop-site.xml before starting the whole hadoop > >> processes, I have one job running fully utilizing the 10 map tasks, but > >> subsequent queries are only using 2 of them, don't know why. > >> I have enough RAM also, no paging out is happening, I'm running on > >> 0.18.3. > >> Right now I put all processes on one machine, namenode, datanode, > >> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM. > >> > >> > >> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi > >> <[email protected]>wrote: > >> > >> > This is a cluster config and not a per job config. > >> > > >> > So this has to be set when the mapreduce cluster first comes up. > >> > > >> > Koji > >> > > >> > > >> > -----Original Message----- > >> > From: javateck javateck [mailto:[email protected]] > >> > Sent: Tuesday, April 21, 2009 1:20 PM > >> > To: [email protected] > >> > Subject: mapred.tasktracker.map.tasks.maximum > >> > > >> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run > >> a > >> > task, it's only using 2 out of 10, any way to know why it's only using > >> > 2? > >> > thanks > >> > > >> > > > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. >
