Dear Spark community,
Just wanted to bring this issue up which was filed for Spark 1.6.1 (
https://issues.apache.org/jira/browse/SPARK-15544) but also exists in Spark
2.3.0 (https://issues.apache.org/jira/browse/SPARK-23530)
We have run into this on production, where Spark Master shuts down if th
;
>>
>> Sent from my iPhone
>>
>> On Jun 15, 2016, at 8:53 AM, Gene Pang wrote:
>>
>> As Sven mentioned, you can use Alluxio to store RDDs in off-heap memory,
>> and you can then share that RDD across different jobs. If you would like to
>> run S
(not from each machine). If not set, the default will be
> spark.deploy.defaultCores on Spark's standalone cluster manager, or
> infinite (all available cores) on Mesos.”
>
>
>
> *David Newberger*
>
>
>
> *From:* agateaaa [mailto:agate...@gmail.com]
> *Sent:* Wednesday, June 15,
umentation can help:
> http://www.alluxio.org/documentation/master/en/Running-Spark-on-Alluxio.html
>
> Thanks,
> Gene
>
> On Tue, Jun 14, 2016 at 12:44 AM, agateaaa wrote:
>
>> Hi,
>>
>> I am seeing this issue too with pyspark (Using Spark 1.6.1). I have set
92 S 0.0 0.0 0:00.38 python
-m + <--pyspark.daemon
Is there any way to control the number of pyspark.daemon processes that get
spawned ?
Thank you
Agateaaa
On Sun, Mar 27, 2016 at 1:08 AM, Sven Krasser wrote:
> Hey Ken,
>
> 1. You're correct, cached RDDs live on th
Hi,
We recently started working on trying to use spark streaming to fetch and
process data from kafka. (Direct Streaming, Not Receiver based Spark 1.5.2)
We want to be able to stop the streaming application and tried implementing
the approach suggested above, using stopping thread and calling
ssc.