Hi All,
My spark streaming jobs are filling up the disk within a short amount of
time < 10 mins. I have a disk space of 10GB and it is getting full
at SPARK_LOCAL_DIRS location. In my case SPARK_LOCAL_DIRS is set to
/usr/local/spark/temp.
There are lot of files like this input-0-1489072623
Hi all,
I was currently playing around with spark-env around SPARK_LOCAL_DIRS in order
to add additional shuffle storage.
But since I did this, I am getting too many open files error if total executor
cores is high. I am also getting low parallelism, by monitoring the running
tasks on some
saggio originale-
Da: Jan Rock [mailto:r...@do-hadoop.com]
Inviato: venerdì 15 aprile 2016 18:04
A: Luca Guerra
Cc: user@spark.apache.org
Oggetto: Re: How many disks for spark_local_dirs?
Hi,
is it physical server or AWS/Azure? What are the executed parameters for
spark-shell command? Hadoop di
>
> Thank you very much.
>
> Luca
>
>
>
> *Da:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
> *Inviato:* venerdì 15 aprile 2016 18:56
> *A:* Luca Guerra
> *Cc:* user @spark
> *Oggetto:* Re: How many disks for spark_local_dirs?
>
>
>
> Is that
: user @spark
Oggetto: Re: How many disks for spark_local_dirs?
Is that 32 CPUs or 32 cores?
So in this configuration assuming 32 cores you have I worker with how much
memory (deducting memory for OS etc) and 32 cores.
What is the ratio of memory per core in this case?
HTH
Dr Mich Talebzadeh
Hi,
is it physical server or AWS/Azure? What are the executed parameters for
spark-shell command? Hadoop distro/version and Spark version?
Kind Regards,
Jan
> On 15 Apr 2016, at 16:15, luca_guerra wrote:
>
> Hi,
> I'm looking for a solution to improve my Spark cluster performances, I have
>
Is that 32 CPUs or 32 cores?
So in this configuration assuming 32 cores you have I worker with how much
memory (deducting memory for OS etc) and 32 cores.
What is the ratio of memory per core in this case?
HTH
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh
Hi,
I'm looking for a solution to improve my Spark cluster performances, I have
read from http://spark.apache.org/docs/latest/hardware-provisioning.html:
"We recommend having 4-8 disks per node", I have tried both with one and two
disks but I have seen that with 2 disks the execution time is double
Ø Also, it can be a problem when reusing the same sparkcontext for many runs.
That is what happen to me. We use spark jobserver and use one sparkcontext for
all jobs. The SPARK_LOCAL_DIRS is not cleaned up and is eating disk space
quickly.
Ningjun
From: Marius Soutier [mailto:mps
the only problematic case is when things go bad and
> the cleaner is not executed.
>
> Also, it can be a problem when reusing the same sparkcontext for many runs.
>
> Guillaume
>> It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned
>> automati
Right, I remember now, the only problematic case is when things go bad
and the cleaner is not executed.
Also, it can be a problem when reusing the same sparkcontext for many runs.
Guillaume
It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned
automatically. From the source code
It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned automatically.
From the source code comments:
// SPARK_LOCAL_DIRS environment variable, and deleted by the Worker when the
// application finishes.
> On 13.04.2015, at 11:26, Guillaume Pitel wrote:
>
> Does it also clea
:*Is the disk space in SPARK_LOCAL_DIRS cleanned up?
I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are
shuffled, spark writes to this folder. I found that the disk space of
this folder keep on increase quickly and at certain point I will run
out of disk space.
I wonder does spark clean up the di
>
> Thanks
> Ningjun
>
> From: Wang, Ningjun (LNG-NPV)
> Sent: Thursday, April 02, 2015 12:14 PM
> To: user@spark.apache.org <mailto:user@spark.apache.org>
> Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up?
>
> I set SPARK_LOCAL_DIRS to C:\t
Hi,
I had to setup a cron job for cleanup in $SPARK_HOME/work and in
$SPARK_LOCAL_DIRS.
Here are the cron lines. Unfortunately it's for *nix machines, I guess
you will have to adapt it seriously for Windows.
12 * * * * find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {}
Does anybody have an answer for this?
Thanks
Ningjun
From: Wang, Ningjun (LNG-NPV)
Sent: Thursday, April 02, 2015 12:14 PM
To: user@spark.apache.org
Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up?
I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark
I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark
writes to this folder. I found that the disk space of this folder keep on
increase quickly and at certain point I will run out of disk space.
I wonder does spark clean up the disk spac in this folder once the
Hi ,
What is the difference between SPARK_LOCAL_DIRS and SPARK_WORKER_DIR ? Also
does spark clean these up after the execution ?
Regards,
Gaurav
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-LOCAL-DIRS-and-SPARK-WORKER-DIR-tp21612.html
Sent from
e you want to be able to monitor the file
>>>> outputs as they occur you can also use HDFS (assuming your Spark nodes are
>>>> also HDFS members they will benefit from data locality).
>>>>
>>>> It looks like the problem you are seeing is that a lock can
HDFS members they will benefit from data locality).
>>>
>>> It looks like the problem you are seeing is that a lock cannot be
>>> acquired on the output file in the central file system.
>>>
>>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote:
>>>
nefit from data locality).
>>>
>>> It looks like the problem you are seeing is that a lock cannot be
>>> acquired on the output file in the central file system.
>>>
>>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote:
>>>
>>>> Hi,
>
so HDFS members they will benefit from data locality).
>>
>> It looks like the problem you are seeing is that a lock cannot be
>> acquired on the output file in the central file system.
>>
>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote:
>>
>>> Hi
11 2015 at 11:55:55 AM TJ Klein wrote:
>
>> Hi,
>>
>> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different
>> path then local directory.
>>
>> On our cluster we have a folder for temporary files (in a central file
>> system), which is c
lock cannot be acquired
on the output file in the central file system.
On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote:
> Hi,
>
> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different
> path then local directory.
>
> On our cluster we have a folder for t
Hi,
Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different
path then local directory.
On our cluster we have a folder for temporary files (in a central file
system), which is called /scratch.
When setting SPARK_LOCAL_DIRS=/scratch/
I get:
An error occurred while calling
I'm using spark 1.1 and the provided ec2 scripts to start my cluster
(r3.8xlarge machines). From the spark-shell, I can verify that the environment
variables are set
scala> System.getenv("SPARK_LOCAL_DIRS")res0: String = /mnt/spark,/mnt2/spark
However, when I look o
Actually I faced it yesterday...
I had to put it in spark-env.sh and take it out from spark-defaults.conf on
1.0.1...Note that this settings should be visible on all workers..
After that I validated that SPARK_LOCAL_DIRS was indeed getting used for
shuffling...
On Thu, Aug 14, 2014 at 10:27 AM
=/spark/spill
*Associated warning:*
14/08/14 10:10:39 WARN SparkConf: In Spark 1.0 and later spark.local.dir
will be overridden by the value set by the cluster manager (via
SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN).
14/08/14 10:10:39 WARN SparkConf:
SPARK_JAVA_OPTS was detected (
tions regardless of their age. This is tracked at
https://issues.apache.org/jira/browse/SPARK-1860
On Wed, Aug 13, 2014 at 9:47 PM, Debasish Das
wrote:
> Hi,
>
> I have set up the SPARK_LOCAL_DIRS option in spark-env.sh so that Spark
> can use more shuffle space...
>
> Does Spark
Hi,
I have set up the SPARK_LOCAL_DIRS option in spark-env.sh so that Spark can
use more shuffle space...
Does Spark cleans all the shuffle files once the runs are done ? Seems to
me that the shuffle files are not cleaned...
Do I need to set this variable ? spark.cleaner.ttl
Right now we are
// assuming Spark 1.0
Hi Baoqiang,
In my experience for the standalone cluster you need to set
SPARK_WORKER_DIR not SPARK_LOCAL_DIRS to control where shuffle files are
written. I think this is a documentation issue that could be improved, as
http://spark.apache.org/docs/latest/spark
Hi
I’m trying to using a specific dir for spark working directory since I have
limited space at /tmp. I tried:
1)
export SPARK_LOCAL_DIRS=“/mnt/data/tmp”
or 2)
SPARK_LOCAL_DIRS=“/mnt/data/tmp” in spark-env.sh
But neither worked, since the output of spark still saying
ERROR
32 matches
Mail list logo