Hi Todd, Thanks a lot for your detailed explanation and recommendation, it really helps a lot!
Best Regards, Carp 2010/6/26 Todd Lipcon <t...@cloudera.com> > 2010/6/25 Yu Li <car...@gmail.com> > > > Hi Todd, > > > > Sorry for bother again, could you further explain what's the 24 bytes > > additional overhead for each record of map output? What cost the overhead > > and what it is for? Thanks a lot. > > > > I actually misremembered, sorry - it's 16 bytes. > > In the kvindices buffer: > 4 bytes for partition ID of each record > 4 bytes for the key offset in data buffer > 4 bytes for the value offset in data buffer > > In the kvoffsets buffer: > 4 bytes for an index into the kvindices buffer (this is so that the spill > sort can just move around indices instead of the entire object) > > For more detail, I would recommend reading the code, or looking for Chris > Douglas's slides from the HUG earlier this year where he gave a very > informative talk on the evolution of the mapside spill. > > -Todd > > > > > > Best Regards, > > Carp > > 在 2010年6月24日 上午1:49,Todd Lipcon <t...@cloudera.com>写道: > > > > > Plus there's some overhead for each record of map output. Specifically, > > 24 > > > bytes. So if you output 64MB worth of data, but each of your objects is > > > only > > > 24 bytes long itself, you need more than 128MB worth of spill space for > > it. > > > Last, the map output buffer begins spilling when it is partially full > so > > > that more records can be collected while spill proceeds. > > > > > > 200MB io.sort.mb has enough headroom for most 64M input splits that > don't > > > blow up the data a lot. Expanding much above 200M for most jobs doesn't > > buy > > > you much. Good news is it's easy to tell by looking at the logs to see > > how > > > many times the map tasks are spilling. If you're only spilling once, > more > > > io.sort.mb will not help. > > > > > > -Todd > > > > > > 2010/6/23 李钰 <car...@gmail.com> > > > > > > > Hi Jeff, > > > > > > > > Thanks for your quick reply. Seems my thinking is stuck on the job > > style > > > > I'm > > > > running. Now I'm much clearer about it. > > > > > > > > Best Regards, > > > > Carp > > > > > > > > 2010/6/23 Jeff Zhang <zjf...@gmail.com> > > > > > > > > > Hi 李钰 > > > > > > > > > > The size of map output depends on your Mapper class. The Mapper > class > > > > > will do processing on the input data. > > > > > > > > > > > > > > > > > > > > 2010/6/23 李钰 <car...@gmail.com>: > > > > > > Hi Sriguru, > > > > > > > > > > > > Thanks a lot for your comments and suggestions! > > > > > > Here I still have some questions: since map mainly do data > > > preparation, > > > > > > say split input data into KVPs, sort and partition before spill, > > > would > > > > > the > > > > > > size of map output KVPs be much larger than the input data size? > If > > > > not, > > > > > > since one map task deals with one input split, and one input > split > > is > > > > > > usually 64M, the map KVPs size would be proximately 64M. Could > you > > > > please > > > > > > give me some example on map output much larger than the input > > split? > > > It > > > > > > really confuse me for some time, thanks. > > > > > > > > > > > > Others, > > > > > > > > > > > > Also badly need your help if you know about this, thanks. > > > > > > > > > > > > Best Regards, > > > > > > Carp > > > > > > > > > > > > 在 2010年6月23日 下午5:11,Srigurunath Chakravarthi < > > srig...@yahoo-inc.com > > > > >写道: > > > > > > > > > > > >> Hi Carp, > > > > > >> Your assumption is right that this is a per-map-task setting. > > > > > >> However, this buffer stores map output KVPs, not input. > Therefore > > > the > > > > > >> optimal value depends on how much data your map task is > > generating. > > > > > >> > > > > > >> If your output per map is greater than io.sort.mb, these rules > of > > > > thumb > > > > > >> that could work for you: > > > > > >> > > > > > >> 1) Increase max heap of map tasks to use RAM better, but not hit > > > swap. > > > > > >> 2) Set io.sort.mb to ~70% of heap. > > > > > >> > > > > > >> Overall, causing extra "spills" (because of insufficient > > io.sort.mb) > > > > is > > > > > >> much better than risking swapping (by setting io.sort.mb and > heap > > > too > > > > > >> large), in terms of relative performance penalty you will pay. > > > > > >> > > > > > >> Cheers, > > > > > >> Sriguru > > > > > >> > > > > > >> >-----Original Message----- > > > > > >> >From: 李钰 [mailto:car...@gmail.com] > > > > > >> >Sent: Wednesday, June 23, 2010 12:27 PM > > > > > >> >To: common-dev@hadoop.apache.org > > > > > >> >Subject: Questions about recommendation value of the > "io.sort.mb" > > > > > >> >parameter > > > > > >> > > > > > > >> >Dear all, > > > > > >> > > > > > > >> >Here I've got a question about the "io.sort.mb" parameter. We > can > > > > find > > > > > >> >material from Yahoo! or Cloudera which recommend setting this > > value > > > > to > > > > > >> >200 > > > > > >> >if the job scale is large, but I'm confused about this. As I > > know, > > > > > >> >the tasktracker will launch a child-JVM for each task, and > > > > > >> >“*io.sort.mb*” > > > > > >> >presents the buffer size in memory inside *one map task > > child-JVM*, > > > > the > > > > > >> >default value 100MB should be large enough because the input > > split > > > of > > > > > >> >one > > > > > >> >map task is usually 64MB, as large as the block size we usually > > > set. > > > > > >> >Then > > > > > >> >why the recommendation of “*io.sort.mb*” is 200MB for large > jobs > > > (and > > > > > >> >it > > > > > >> >really works)? How could the job size affect the procedure? > > > > > >> >Is there any fault here of my understanding? Any > > comment/suggestion > > > > > >> >will be > > > > > >> >highly valued, thanks in advance. > > > > > >> > > > > > > >> >Best Regards, > > > > > >> >Carp > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Best Regards > > > > > > > > > > Jeff Zhang > > > > > > > > > > > > > > > > > > > > > -- > > > Todd Lipcon > > > Software Engineer, Cloudera > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >