Re: hadoop1.2.1 speedup model

2013-09-09 Thread 牛兆捷
Thanks bobby. I will try more times. Is there any more fine-grained profile tools for each task? For example, cpu utilization, disk and network IO for each task. 2013/9/9 Robert Evans > How many times did you run the experiment at each setting? What is the > standard deviation for each o

Re: hadoop1.2.1 speedup model

2013-09-09 Thread Robert Evans
How many times did you run the experiment at each setting? What is the standard deviation for each of these settings. It could be that you are simply running into the error bounds of Hadoop. Hadoop is far from consistent in it's performance. For our benchmarking we typically will run the test 5

Re: hadoop1.2.1 speedup model

2013-09-07 Thread 牛兆捷
But I still want to fine the most efficient assignment and scale both data and nodes as you said, for example in my result, 2 is the best, and 8 is better than 4. Why is it sub-linear from 2 to 4, super-linear from 4 to 8. I find it is hard to model this result. Can you give me some hint about thi

Re: hadoop1.2.1 speedup model

2013-09-07 Thread Vinod Kumar Vavilapalli
Clearly your input size isn't changing. And depending on how they are distributed on the nodes, there could be Datanode/disks contention. The better way to model this is by scaling the input data also linearly. More nodes should process more data in the same amount of time. Thanks, +Vinod On

Re: hadoop1.2.1 speedup model

2013-09-06 Thread 牛兆捷
>From 2 to 4, the performance increase sub-linearly, however from 4 to 8, it seems super-linear. Is it caused by some disk contention bottleneck? 2013/9/6 牛兆捷 > Hi all: > > I vary the computational nodes of cluster and get the speedup result in > attachment. > > In my mind, there are three typ