Hi, I am trying to understand the effects of increasing block size or minimum split size. If I increase them, then a mapper will process more data, effectively reducing the number of mappers that will be spawned. As there is an overhead in starting mappers, so this seems good.
However, If I increase their values too much, what negative effects will come up? Put in other words, how to compute what is the best number of mappers to start for processing a given size data on a cluster. For calculations, let us assume- 100G of data, 4 machines (dual core). Also if I set the reuse jvm flag to -1, will it make a difference? Thanks, Tarandeep
