Thanks Jothi...

-Tarandeep

On Fri, Jun 12, 2009 at 4:35 AM, Jothi Padmanabhan <[email protected]>wrote:

> If the number of maps is reduced,  it is possible that the size of
> individual map outputs might increase. A couple of possible issues come to
> mind immediately:
> 1.  Number of spills in the map might be more. This might incur extra cost
> during merging.
> 2. Also, while the reduces might pull in more data per fetch (which is
> good), it might also result in a state where the reducer is not able to
> store the map output in memory but needs to shuffle it to disk.
>
> JVM reuse should help, but if the individual task completion time is very
> high, there might not be any discernible performance gain.
>
> Jothi
>
>
> On 6/11/09 11:36 PM, "Tarandeep Singh" <[email protected]> wrote:
>
> > Hi,
> >
> > I am trying to understand the effects of increasing block size or minimum
> > split size. If I increase them, then a mapper will process more data,
> > effectively reducing the number of mappers that will be spawned. As there
> is
> > an overhead in starting mappers, so this seems good.
> >
> > However, If I increase their values too much, what negative effects will
> > come up? Put in other words, how to compute what is the best number of
> > mappers to start for processing a given size data on a cluster.
> >
> > For calculations, let us assume- 100G of data, 4 machines (dual core).
> >
> > Also if I set the reuse jvm flag to -1, will it make a difference?
> >
> > Thanks,
> > Tarandeep
>
>

Reply via email to