On Tue, Jul 14, 2015 at 12:03 PM, Shushant Arora <shushantaror...@gmail.com> wrote:
> Can a container have multiple JVMs running in YARN? > Yes and no. A container runs a single command, but that process can start other processes, and those also count towards the resource usage of the container (mostly memory). For example, pyspark will spawn python processes from the main JVM. But if you're asking about executors, ignoring pyspark or other non Scala/Java backends, there will be a single JVM. Spark will allow a number of concurrent tasks to run that matches the number of vcores you requested for the executor. > 1.Is the difference is in Hadoop Mapreduce job - say I specify 20 reducers > and my job uses 10 map tasks then, it need total 30 containers or 30 vcores > ? > It's not that simple and trying to compare that to Spark is kinda misleading. -- Marcelo