Thanks bobby.

I will try more times.

Is there any more fine-grained profile tools for each task? For example,
cpu utilization, disk and network IO for each task.






2013/9/9 Robert Evans <ev...@yahoo-inc.com>

> How many times did you run the experiment at each setting?  What is the
> standard deviation for each of these settings.  It could be that you are
> simply running into the error bounds of Hadoop.  Hadoop is far from
> consistent in it's performance.  For our benchmarking we typically will
> run the test 5 times, throw out the top and bottom result, as possibly
> outliers and then average the other runs.  Even with that we have to be
> very careful that we weed out bad nodes or the numbers are useless for
> comparison.  The other thing to look at is where was all of the time spent
> for each of these settings.  The map portion should be very close to
> linear with the number of tasks, assuming that there is no disk or network
> contention.  The shuffle is far from linear as the number of fetches is a
> function of the number of maps and the number of reducers.  The reduce
> phase itself should be close to linear assuming that there isn't much skew
> to your data.
>
> --Bobby
>
> On 9/7/13 3:33 AM, "牛兆捷" <nzjem...@gmail.com> wrote:
>
> >But I still want to fine the most efficient assignment and scale both data
> >and nodes as you said, for example in my result, 2 is the best, and 8 is
> >better than 4.
> >
> >Why is it sub-linear from 2 to 4, super-linear from 4 to 8. I find it is
> >hard to model this result. Can you give me some hint about this kind of
> >trend?
> >
> >
> >2013/9/7 Vinod Kumar Vavilapalli <vino...@hortonworks.com>
> >
> >>
> >> Clearly your input size isn't changing. And depending on how they are
> >> distributed on the nodes, there could be Datanode/disks contention.
> >>
> >> The better way to model this is by scaling the input data also linearly.
> >> More nodes should process more data in the same amount of time.
> >>
> >> Thanks,
> >> +Vinod
> >>
> >> On Sep 6, 2013, at 8:27 AM, 牛兆捷 wrote:
> >>
> >> > Hi all:
> >> >
> >> > I vary the computational nodes of cluster and get the speedup result
> >>in
> >> attachment.
> >> >
> >> > In my mind, there are three type of speedup model: linear, sub-linear
> >> and super-linear. However the curve of my result seems a little
> >>strange. I
> >> have attached it.
> >> > <speedup.png>
> >> >
> >> > This is sort in example.jar, actually it is done only using the
> >>default
> >> map-reduce mechanism of Hadoop.
> >> >
> >> > I use hadoop-1.2.1, set 8 map slots and 8 reduce slots per node(12
> >>cpu,
> >> 20g men)
> >> >  io.sort.mb = 512, block size = 512mb, heap size = 1024mb,
> >>  reduce.slowstart = 0.05, the others are default.
> >> >
> >> > Input data: 20g, I divide it to 64 files
> >> >
> >> > Sort example: 64 map tasks, 64 reduce tasks
> >> >
> >> > Computational nodes: varying from 2 to 9
> >> >
> >> > Why the speedup mechanism is like this? How can I model it properly?
> >> >
> >> > Thanks~
> >> >
> >> > --
> >> > Sincerely,
> >> > Zhaojie
> >> >
> >>
> >>
> >> --
> >> CONFIDENTIALITY NOTICE
> >> NOTICE: This message is intended for the use of the individual or
> >>entity to
> >> which it is addressed and may contain information that is confidential,
> >> privileged and exempt from disclosure under applicable law. If the
> >>reader
> >> of this message is not the intended recipient, you are hereby notified
> >>that
> >> any printing, copying, dissemination, distribution, disclosure or
> >> forwarding of this communication is strictly prohibited. If you have
> >> received this communication in error, please contact the sender
> >>immediately
> >> and delete it from your system. Thank You.
> >>
> >
> >
> >
> >--
> >*Sincerely,*
> >*Zhaojie*
> >*
> >*
>
>


-- 
*Sincerely,*
*Zhaojie*
*
*

Reply via email to