Try to have a look at this doc http://blog.cloudera.com/blog/2014/05/apache-spark-resource-management-and-yarn-app-models/
HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 18 April 2016 at 20:43, Dogtail L <spark.ru...@gmail.com> wrote: > Hi, > > When launching a job in Spark, I have great trouble deciding the number of > tasks. Someone says it is better to create a task per HDFS block size, > i.e., make sure one task process 128MB of input data; others suggest that > the number of tasks should be the twice of the total cores available to the > job. Also, I found that someone suggests launching small tasks using Spark, > i.e., make sure each task lasts around 100ms. > > I am quite confused about all these suggestions. Is there any general rule > for deciding the number of tasks in Spark? Great thanks! > > Best >