Dear all,

Here I've got a question about the "io.sort.mb" parameter. We can find
material from Yahoo! or Cloudera which recommend setting this value to 200
if the job scale is large, but I'm confused about this. As I know,
the tasktracker will launch a child-JVM for each task, and “*io.sort.mb*”
presents the buffer size in memory inside *one map task child-JVM*, the
default value 100MB should be large enough because the input split of one
map task is usually 64MB, as large as the block size we usually set. Then
why the recommendation of “*io.sort.mb*” is 200MB for large jobs (and it
really works)? How could the job size affect the procedure?
Is there any fault here of my understanding? Any comment/suggestion will be
highly valued, thanks in advance.

Best Regards,
Carp

Reply via email to