Hi,

I am trying to understand the effects of increasing block size or minimum
split size. If I increase them, then a mapper will process more data,
effectively reducing the number of mappers that will be spawned. As there is
an overhead in starting mappers, so this seems good.

However, If I increase their values too much, what negative effects will
come up? Put in other words, how to compute what is the best number of
mappers to start for processing a given size data on a cluster.

For calculations, let us assume- 100G of data, 4 machines (dual core).

Also if I set the reuse jvm flag to -1, will it make a difference?

Thanks,
Tarandeep

Reply via email to