subject:"Re\: Limit pyspark.daemon threads"

Re: Limit pyspark.daemon threads

2016-06-16 Thread agateaaa

There is only one executor on each worker. I see one pyspark.daemon, but when the streaming jobs starts a batch I see that it spawns 4 other pyspark.daemon processes. After the batch completes, the 4 pyspark.demon processes die and there is only one left. I think this behavior was introduced by th

Re: Limit pyspark.daemon threads

2016-06-15 Thread Jeff Zhang

>>> I am seeing this issue too with pyspark (Using Spark 1.6.1). I have set spark.executor.cores to 1, but I see that whenever streaming batch starts processing data, see python -m pyspark.daemon processes increase gradually to about 5, (increasing CPU% on a box about 4-5 times, each pyspark.daemo

Re: Limit pyspark.daemon threads

2016-06-15 Thread Sudhir Babu Pothineni

Hi Ken, It may be also related to Grid Engine job scheduling? If it is 16 core (virtual cores?), grid engine allocates 16 slots, If you use 'max' scheduling, it will send 16 processes sequentially to same machine, on the top of it each spark job has its own executors. Limit the number of jobs sc

Re: Limit pyspark.daemon threads

2016-06-15 Thread agateaaa

(not from each machine). If not set, the default will be > spark.deploy.defaultCores on Spark's standalone cluster manager, or > infinite (all available cores) on Mesos.” > > > > *David Newberger* > > > > *From:* agateaaa [mailto:agate...@gmail.com] > *Sent:* Wednesday, June 15,

RE: Limit pyspark.daemon threads

2016-06-15 Thread David Newberger

to:agate...@gmail.com] Sent: Wednesday, June 15, 2016 4:39 PM To: Gene Pang Cc: Sven Krasser; Carlile, Ken; user Subject: Re: Limit pyspark.daemon threads Thx Gene! But my concern is with CPU usage not memory. I want to see if there is anyway to control the number of pyspark.daemon processes that

Re: Limit pyspark.daemon threads

2016-06-15 Thread agateaaa

Thx Gene! But my concern is with CPU usage not memory. I want to see if there is anyway to control the number of pyspark.daemon processes that get spawned. We have some restriction on number of CPU's we can use on a node, and number of pyspark.daemon processes that get created dont seem to honor sp

Re: Limit pyspark.daemon threads

2016-06-15 Thread Gene Pang

As Sven mentioned, you can use Alluxio to store RDDs in off-heap memory, and you can then share that RDD across different jobs. If you would like to run Spark on Alluxio, this documentation can help: http://www.alluxio.org/documentation/master/en/Running-Spark-on-Alluxio.html Thanks, Gene On Tue,

Re: Limit pyspark.daemon threads

2016-06-14 Thread agateaaa

Hi, I am seeing this issue too with pyspark (Using Spark 1.6.1). I have set spark.executor.cores to 1, but I see that whenever streaming batch starts processing data, see python -m pyspark.daemon processes increase gradually to about 5, (increasing CPU% on a box about 4-5 times, each pyspark.daem

Re: Limit pyspark.daemon threads

2016-03-26 Thread Sven Krasser

Hey Ken, 1. You're correct, cached RDDs live on the JVM heap. (There's an off-heap storage option using Alluxio, formerly Tachyon, with which I have no experience however.) 2. The worker memory setting is not a hard maximum unfortunately. What happens is that during aggregation the Python daemon

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken

This is extremely helpful! I’ll have to talk to my users about how the python memory limit should be adjusted and what their expectations are. I’m fairly certain we bumped it up in the dark past when jobs were failing because of insufficient memory for the python processes. So just

Re: Limit pyspark.daemon threads

2016-03-26 Thread Sven Krasser

My understanding is that the spark.executor.cores setting controls the number of worker threads in the executor in the JVM. Each worker thread communicates then with a pyspark daemon process (these are not threads) to stream data into Python. There should be one daemon process per worker thread (bu

Re: Limit pyspark.daemon threads

2016-03-26 Thread Carlile, Ken

Thanks, Sven! I know that I’ve messed up the memory allocation, but I’m trying not to think too much about that (because I’ve advertised it to my users as “90GB for Spark works!” and that’s how it displays in the Spark UI (totally ignoring the python processes). So I’ll need to deal w

Re: Limit pyspark.daemon threads

2016-03-25 Thread Sven Krasser

Hey Ken, I also frequently see more pyspark daemons than configured concurrency, often it's a low multiple. (There was an issue pre-1.3.0 that caused this to be quite a bit higher, so make sure you at least have a recent version; see SPARK-5395.) Each pyspark daemon tries to stay below the config

Re: Limit pyspark.daemon threads

2016-03-25 Thread Carlile, Ken

Further data on this. I’m watching another job right now where there are 16 pyspark.daemon threads, all of which are trying to get a full core (remember, this is a 16 core machine). Unfortunately , the java process actually running the spark worker is trying to take several cores of its o

Re: Limit pyspark.daemon threads

2016-03-21 Thread Carlile, Ken

No further input on this? I discovered today that the pyspark.daemon threadcount was actually 48, which makes a little more sense (at least it’s a multiple of 16), and it seems to be happening at reduce and collect portions of the code. —Ken On Mar 17, 2016, at 10:51 AM, Carlile,

Re: Limit pyspark.daemon threads

2016-03-20 Thread Ted Yu

I took a look at docs/configuration.md Though I didn't find answer for your first question, I think the following pertains to your second question: spark.python.worker.memory 512m Amount of memory to use per python worker process during aggregation, in the same format as JVM memory

Re: Limit pyspark.daemon threads

2016-03-18 Thread Carlile, Ken

Thanks! I found that part just after I sent the email… whoops. I’m guessing that’s not an issue for my users, since it’s been set that way for a couple of years now. The thread count is definitely an issue, though, since if enough nodes go down, they can’t schedule their spark clusters.

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

RE: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

Re: Limit pyspark.daemon threads

17 matches

Site Navigation

Mail list logo

Footer information