Spark Jobs filling up the disk at SPARK_LOCAL_DIRS location

2017-03-09 Thread kant kodali
Hi All, My spark streaming jobs are filling up the disk within a short amount of time < 10 mins. I have a disk space of 10GB and it is getting full at SPARK_LOCAL_DIRS location. In my case SPARK_LOCAL_DIRS is set to /usr/local/spark/temp. There are lot of files like this input-0-1489072623

multiple SPARK_LOCAL_DIRS causing strange behavior in parallelism

2016-07-29 Thread Saif.A.Ellafi
Hi all, I was currently playing around with spark-env around SPARK_LOCAL_DIRS in order to add additional shuffle storage. But since I did this, I am getting too many open files error if total executor cores is high. I am also getting low parallelism, by monitoring the running tasks on some

R: How many disks for spark_local_dirs?

2016-04-18 Thread Luca Guerra
saggio originale- Da: Jan Rock [mailto:r...@do-hadoop.com] Inviato: venerdì 15 aprile 2016 18:04 A: Luca Guerra Cc: user@spark.apache.org Oggetto: Re: How many disks for spark_local_dirs? Hi, is it physical server or AWS/Azure? What are the executed parameters for spark-shell command? Hadoop di

Re: How many disks for spark_local_dirs?

2016-04-18 Thread Mich Talebzadeh
> > Thank you very much. > > Luca > > > > *Da:* Mich Talebzadeh [mailto:mich.talebza...@gmail.com] > *Inviato:* venerdì 15 aprile 2016 18:56 > *A:* Luca Guerra > *Cc:* user @spark > *Oggetto:* Re: How many disks for spark_local_dirs? > > > > Is that

R: How many disks for spark_local_dirs?

2016-04-18 Thread Luca Guerra
: user @spark Oggetto: Re: How many disks for spark_local_dirs? Is that 32 CPUs or 32 cores? So in this configuration assuming 32 cores you have I worker with how much memory (deducting memory for OS etc) and 32 cores. What is the ratio of memory per core in this case? HTH Dr Mich Talebzadeh

Re: How many disks for spark_local_dirs?

2016-04-15 Thread Jan Rock
Hi, is it physical server or AWS/Azure? What are the executed parameters for spark-shell command? Hadoop distro/version and Spark version? Kind Regards, Jan > On 15 Apr 2016, at 16:15, luca_guerra wrote: > > Hi, > I'm looking for a solution to improve my Spark cluster performances, I have >

Re: How many disks for spark_local_dirs?

2016-04-15 Thread Mich Talebzadeh
Is that 32 CPUs or 32 cores? So in this configuration assuming 32 cores you have I worker with how much memory (deducting memory for OS etc) and 32 cores. What is the ratio of memory per core in this case? HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh

How many disks for spark_local_dirs?

2016-04-15 Thread luca_guerra
Hi, I'm looking for a solution to improve my Spark cluster performances, I have read from http://spark.apache.org/docs/latest/hardware-provisioning.html: "We recommend having 4-8 disks per node", I have tried both with one and two disks but I have seen that with 2 disks the execution time is double

RE: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-14 Thread Wang, Ningjun (LNG-NPV)
Ø Also, it can be a problem when reusing the same sparkcontext for many runs. That is what happen to me. We use spark jobserver and use one sparkcontext for all jobs. The SPARK_LOCAL_DIRS is not cleaned up and is eating disk space quickly. Ningjun From: Marius Soutier [mailto:mps

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-14 Thread Marius Soutier
the only problematic case is when things go bad and > the cleaner is not executed. > > Also, it can be a problem when reusing the same sparkcontext for many runs. > > Guillaume >> It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned >> automati

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-14 Thread Guillaume Pitel
Right, I remember now, the only problematic case is when things go bad and the cleaner is not executed. Also, it can be a problem when reusing the same sparkcontext for many runs. Guillaume It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned automatically. From the source code

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-14 Thread Marius Soutier
It cleans the work dir, and SPARK_LOCAL_DIRS should be cleaned automatically. From the source code comments: // SPARK_LOCAL_DIRS environment variable, and deleted by the Worker when the // application finishes. > On 13.04.2015, at 11:26, Guillaume Pitel wrote: > > Does it also clea

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-13 Thread Guillaume Pitel
:*Is the disk space in SPARK_LOCAL_DIRS cleanned up? I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark writes to this folder. I found that the disk space of this folder keep on increase quickly and at certain point I will run out of disk space. I wonder does spark clean up the di

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-13 Thread Marius Soutier
> > Thanks > Ningjun > > From: Wang, Ningjun (LNG-NPV) > Sent: Thursday, April 02, 2015 12:14 PM > To: user@spark.apache.org <mailto:user@spark.apache.org> > Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up? > > I set SPARK_LOCAL_DIRS to C:\t

Re: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-10 Thread Guillaume Pitel
Hi, I had to setup a cron job for cleanup in $SPARK_HOME/work and in $SPARK_LOCAL_DIRS. Here are the cron lines. Unfortunately it's for *nix machines, I guess you will have to adapt it seriously for Windows. 12 * * * * find $SPARK_HOME/work -cmin +1440 -prune -exec rm -rf {}

RE: Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-10 Thread Wang, Ningjun (LNG-NPV)
Does anybody have an answer for this? Thanks Ningjun From: Wang, Ningjun (LNG-NPV) Sent: Thursday, April 02, 2015 12:14 PM To: user@spark.apache.org Subject: Is the disk space in SPARK_LOCAL_DIRS cleanned up? I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark

Is the disk space in SPARK_LOCAL_DIRS cleanned up?

2015-04-02 Thread Wang, Ningjun (LNG-NPV)
I set SPARK_LOCAL_DIRS to C:\temp\spark-temp. When RDDs are shuffled, spark writes to this folder. I found that the disk space of this folder keep on increase quickly and at certain point I will run out of disk space. I wonder does spark clean up the disk spac in this folder once the

SPARK_LOCAL_DIRS and SPARK_WORKER_DIR

2015-02-11 Thread gtinside
Hi , What is the difference between SPARK_LOCAL_DIRS and SPARK_WORKER_DIR ? Also does spark clean these up after the execution ? Regards, Gaurav -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SPARK-LOCAL-DIRS-and-SPARK-WORKER-DIR-tp21612.html Sent from

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
e you want to be able to monitor the file >>>> outputs as they occur you can also use HDFS (assuming your Spark nodes are >>>> also HDFS members they will benefit from data locality). >>>> >>>> It looks like the problem you are seeing is that a lock can

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Charles Feduke
HDFS members they will benefit from data locality). >>> >>> It looks like the problem you are seeing is that a lock cannot be >>> acquired on the output file in the central file system. >>> >>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote: >>>

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
nefit from data locality). >>> >>> It looks like the problem you are seeing is that a lock cannot be >>> acquired on the output file in the central file system. >>> >>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote: >>> >>>> Hi, >

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Charles Feduke
so HDFS members they will benefit from data locality). >> >> It looks like the problem you are seeing is that a lock cannot be >> acquired on the output file in the central file system. >> >> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote: >> >>> Hi

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
11 2015 at 11:55:55 AM TJ Klein wrote: > >> Hi, >> >> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different >> path then local directory. >> >> On our cluster we have a folder for temporary files (in a central file >> system), which is c

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Charles Feduke
lock cannot be acquired on the output file in the central file system. On Wed Feb 11 2015 at 11:55:55 AM TJ Klein wrote: > Hi, > > Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different > path then local directory. > > On our cluster we have a folder for t

SPARK_LOCAL_DIRS Issue

2015-02-11 Thread TJ Klein
Hi, Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different path then local directory. On our cluster we have a folder for temporary files (in a central file system), which is called /scratch. When setting SPARK_LOCAL_DIRS=/scratch/ I get: An error occurred while calling

ec2 script and SPARK_LOCAL_DIRS not created

2014-11-12 Thread Darin McBeath
I'm using spark 1.1 and the provided ec2 scripts to start my cluster (r3.8xlarge machines).  From the spark-shell, I can verify that the environment variables are set scala> System.getenv("SPARK_LOCAL_DIRS")res0: String = /mnt/spark,/mnt2/spark However, when I look o

Re: SPARK_LOCAL_DIRS

2014-08-14 Thread Debasish Das
Actually I faced it yesterday... I had to put it in spark-env.sh and take it out from spark-defaults.conf on 1.0.1...Note that this settings should be visible on all workers.. After that I validated that SPARK_LOCAL_DIRS was indeed getting used for shuffling... On Thu, Aug 14, 2014 at 10:27 AM

SPARK_LOCAL_DIRS

2014-08-14 Thread Brad Miller
=/spark/spill *Associated warning:* 14/08/14 10:10:39 WARN SparkConf: In Spark 1.0 and later spark.local.dir will be overridden by the value set by the cluster manager (via SPARK_LOCAL_DIRS in mesos/standalone and LOCAL_DIRS in YARN). 14/08/14 10:10:39 WARN SparkConf: SPARK_JAVA_OPTS was detected (

Re: SPARK_LOCAL_DIRS option

2014-08-13 Thread Andrew Ash
tions regardless of their age. This is tracked at https://issues.apache.org/jira/browse/SPARK-1860 On Wed, Aug 13, 2014 at 9:47 PM, Debasish Das wrote: > Hi, > > I have set up the SPARK_LOCAL_DIRS option in spark-env.sh so that Spark > can use more shuffle space... > > Does Spark

SPARK_LOCAL_DIRS option

2014-08-13 Thread Debasish Das
Hi, I have set up the SPARK_LOCAL_DIRS option in spark-env.sh so that Spark can use more shuffle space... Does Spark cleans all the shuffle files once the runs are done ? Seems to me that the shuffle files are not cleaned... Do I need to set this variable ? spark.cleaner.ttl Right now we are

Re: set SPARK_LOCAL_DIRS issue

2014-08-11 Thread Andrew Ash
// assuming Spark 1.0 Hi Baoqiang, In my experience for the standalone cluster you need to set SPARK_WORKER_DIR not SPARK_LOCAL_DIRS to control where shuffle files are written. I think this is a documentation issue that could be improved, as http://spark.apache.org/docs/latest/spark

set SPARK_LOCAL_DIRS issue

2014-08-09 Thread Baoqiang Cao
Hi I’m trying to using a specific dir for spark working directory since I have limited space at /tmp. I tried: 1) export SPARK_LOCAL_DIRS=“/mnt/data/tmp” or 2) SPARK_LOCAL_DIRS=“/mnt/data/tmp” in spark-env.sh But neither worked, since the output of spark still saying ERROR