Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-26 Thread Yu Li
> > > > > >> Your assumption is right that this is a per-map-task setting. > > > > > >> However, this buffer stores map output KVPs, not input. > Therefore > > > the > > > > > >> optimal value depends on how much data you

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-25 Thread Todd Lipcon
ng. > > > > >> However, this buffer stores map output KVPs, not input. Therefore > > the > > > > >> optimal value depends on how much data your map task is > generating. > > > > >> > > > > >> If your ou

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-25 Thread Yu Li
value depends on how much data your map task is generating. > > > >> > > > >> If your output per map is greater than io.sort.mb, these rules of > > thumb > > > >> that could work for you: > > > >> > > > >> 1) Increase max

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Yu Li
your map task is generating. > > > >> > > > >> If your output per map is greater than io.sort.mb, these rules of > > thumb > > > >> that could work for you: > > > >> > > > >> 1) Increase max heap of map tasks to use R

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Todd Lipcon
to use RAM better, but not hit swap. > > >> 2) Set io.sort.mb to ~70% of heap. > > >> > > >> Overall, causing extra "spills" (because of insufficient io.sort.mb) > is > > >> much better than risking swapping (by setting io.sort.mb and heap too > &

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread 李钰
;> > >> Overall, causing extra "spills" (because of insufficient io.sort.mb) is > >> much better than risking swapping (by setting io.sort.mb and heap too > >> large), in terms of relative performance penalty you will pay. > >> > >> Cheers, >

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Jeff Zhang
t.mb) is >> much better than risking swapping (by setting io.sort.mb and heap too >> large), in terms of relative performance penalty you will pay. >> >> Cheers, >> Sriguru >> >> >-----Original Message- >> >From: 李钰 [mailto:car.

Re: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread 李钰
t; >Sent: Wednesday, June 23, 2010 12:27 PM > >To: common-dev@hadoop.apache.org > >Subject: Questions about recommendation value of the "io.sort.mb" > >parameter > > > >Dear all, > > > >Here I've got a question about the "io.sort.mb" p

RE: Questions about recommendation value of the "io.sort.mb" parameter

2010-06-23 Thread Srigurunath Chakravarthi
f relative performance penalty you will pay. Cheers, Sriguru >-Original Message- >From: 李钰 [mailto:car...@gmail.com] >Sent: Wednesday, June 23, 2010 12:27 PM >To: common-dev@hadoop.apache.org >Subject: Questions about recommendation value of the "io.sort.mb" >par

Questions about recommendation value of the "io.sort.mb" parameter

2010-06-22 Thread 李钰
Dear all, Here I've got a question about the "io.sort.mb" parameter. We can find material from Yahoo! or Cloudera which recommend setting this value to 200 if the job scale is large, but I'm confused about this. As I know, the tasktracker will launch a child-JVM for each task, and “*io.sort.mb*” p