> > > > > >> Your assumption is right that this is a per-map-task setting.
> > > > > >> However, this buffer stores map output KVPs, not input.
> Therefore
> > > the
> > > > > >> optimal value depends on how much data you
ng.
> > > > >> However, this buffer stores map output KVPs, not input. Therefore
> > the
> > > > >> optimal value depends on how much data your map task is
> generating.
> > > > >>
> > > > >> If your ou
value depends on how much data your map task is generating.
> > > >>
> > > >> If your output per map is greater than io.sort.mb, these rules of
> > thumb
> > > >> that could work for you:
> > > >>
> > > >> 1) Increase max
your map task is generating.
> > > >>
> > > >> If your output per map is greater than io.sort.mb, these rules of
> > thumb
> > > >> that could work for you:
> > > >>
> > > >> 1) Increase max heap of map tasks to use R
to use RAM better, but not hit swap.
> > >> 2) Set io.sort.mb to ~70% of heap.
> > >>
> > >> Overall, causing extra "spills" (because of insufficient io.sort.mb)
> is
> > >> much better than risking swapping (by setting io.sort.mb and heap too
> &
;>
> >> Overall, causing extra "spills" (because of insufficient io.sort.mb) is
> >> much better than risking swapping (by setting io.sort.mb and heap too
> >> large), in terms of relative performance penalty you will pay.
> >>
> >> Cheers,
>
t.mb) is
>> much better than risking swapping (by setting io.sort.mb and heap too
>> large), in terms of relative performance penalty you will pay.
>>
>> Cheers,
>> Sriguru
>>
>> >-----Original Message-
>> >From: 李钰 [mailto:car.
t; >Sent: Wednesday, June 23, 2010 12:27 PM
> >To: common-dev@hadoop.apache.org
> >Subject: Questions about recommendation value of the "io.sort.mb"
> >parameter
> >
> >Dear all,
> >
> >Here I've got a question about the "io.sort.mb" p
f relative performance penalty you will pay.
Cheers,
Sriguru
>-Original Message-
>From: 李钰 [mailto:car...@gmail.com]
>Sent: Wednesday, June 23, 2010 12:27 PM
>To: common-dev@hadoop.apache.org
>Subject: Questions about recommendation value of the "io.sort.mb"
>par
Dear all,
Here I've got a question about the "io.sort.mb" parameter. We can find
material from Yahoo! or Cloudera which recommend setting this value to 200
if the job scale is large, but I'm confused about this. As I know,
the tasktracker will launch a child-JVM for each task, and “*io.sort.mb*”
p
10 matches
Mail list logo