Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
Try to figure out what the env vars and arguments of the worker JVM and Python process are. Maybe you'll get a clue. On Mon, Jul 4, 2016 at 11:42 AM Mathieu Longtin wrote: > I started with a download of 1.6.0. These days, we use a self compiled > 1.6.2. > > On Mon, Jul 4, 2016 at 11:39 AM Ashwin

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Thanks. I'll try that. Hopefully that should work. On Mon, Jul 4, 2016 at 9:12 PM, Mathieu Longtin wrote: > I started with a download of 1.6.0. These days, we use a self compiled > 1.6.2. > > On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav > wrote: > >> I am thinking of any possibilities as to w

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
I started with a download of 1.6.0. These days, we use a self compiled 1.6.2. On Mon, Jul 4, 2016 at 11:39 AM Ashwin Raaghav wrote: > I am thinking of any possibilities as to why this could be happening. If > the cores are multi-threaded, should that affect the daemons? Your spark > was built fr

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
I am thinking of any possibilities as to why this could be happening. If the cores are multi-threaded, should that affect the daemons? Your spark was built from source code or downloaded as a binary, though that should not technically change anything? On Mon, Jul 4, 2016 at 9:03 PM, Mathieu Longti

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
1.6.1. I have no idea. SPARK_WORKER_CORES should do the same. On Mon, Jul 4, 2016 at 11:24 AM Ashwin Raaghav wrote: > Which version of Spark are you using? 1.6.1? > > Any ideas as to why it is not working in ours? > > On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin > wrote: > >> 16. >> >> On M

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Which version of Spark are you using? 1.6.1? Any ideas as to why it is not working in ours? On Mon, Jul 4, 2016 at 8:51 PM, Mathieu Longtin wrote: > 16. > > On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav > wrote: > >> Hi, >> >> I tried what you suggested and started the slave using the followi

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
16. On Mon, Jul 4, 2016 at 11:16 AM Ashwin Raaghav wrote: > Hi, > > I tried what you suggested and started the slave using the following > command: > > start-slave.sh --cores 1 > > But it still seems to start as many pyspark daemons as the number of cores > in the node (1 parent and 3 workers).

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Hi, I tried what you suggested and started the slave using the following command: start-slave.sh --cores 1 But it still seems to start as many pyspark daemons as the number of cores in the node (1 parent and 3 workers). Limiting it via spark-env.sh file by giving SPARK_WORKER_CORES=1 also didn'

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
It depends on what you want to do: If, on any given server, you don't want Spark to use more than one core, use this to start the workers: SPARK_HOME/sbin/start-slave.sh --cores=1 If you have a bunch of servers dedicated to Spark, but you don't want a driver to use more than one core per server,

Re: Limiting Pyspark.daemons

2016-07-04 Thread Ashwin Raaghav
Hi Mathieu, Isn't that the same as setting "spark.executor.cores" to 1? And how can I specify "--cores=1" from the application? On Mon, Jul 4, 2016 at 8:06 PM, Mathieu Longtin wrote: > When running the executor, put --cores=1. We use this and I only see 2 > pyspark process, one seem to be the p

Re: Limiting Pyspark.daemons

2016-07-04 Thread Mathieu Longtin
When running the executor, put --cores=1. We use this and I only see 2 pyspark process, one seem to be the parent of the other and is idle. In your case, are all pyspark process working? On Mon, Jul 4, 2016 at 3:15 AM ar7 wrote: > Hi, > > I am currently using PySpark 1.6.1 in my cluster. When a

Limiting Pyspark.daemons

2016-07-04 Thread ar7
Hi, I am currently using PySpark 1.6.1 in my cluster. When a pyspark application is run, the load on the workers seems to go more than what was given. When I ran top, I noticed that there were too many Pyspark.daemons processes running. There was another mail thread regarding the same: https://ma