Thanks. Yes, I think it might not always make sense to lock files,
particularly if every executor is getting its own path.
On Wed, Feb 11, 2015 at 2:31 PM, Charles Feduke
wrote:
> And just glancing at the Spark source code around where the stack trace
> originates:
>
> val lockFile = new File(lo
And just glancing at the Spark source code around where the stack trace
originates:
val lockFile = new File(localDir, lockFileName)
val raf = new RandomAccessFile(lockFile, "rw")
// Only one executor entry.
// The FileLock is only used to control synchronization for executors
dow
Thanks a lot. I will have a look at it.
On Wed, Feb 11, 2015 at 2:20 PM, Charles Feduke
wrote:
> Take a look at this:
>
> http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre
>
> Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf
> (linked from that article)
>
> to get
Take a look at this:
http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre
Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf
(linked from that article)
to get a better idea of what your options are.
If its possible to avoid writing to [any] disk I'd recommend that rout
Thanks for the info. The file system in use is a Lustre file system.
Best,
Tassilo
On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke
wrote:
> A central location, such as NFS?
>
> If they are temporary for the purpose of further job processing you'll
> want to keep them local to the node in the
A central location, such as NFS?
If they are temporary for the purpose of further job processing you'll want
to keep them local to the node in the cluster, i.e., in /tmp. If they are
centralized you won't be able to take advantage of data locality and the
central file store will become a bottlenec
Hi,
Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different
path then local directory.
On our cluster we have a folder for temporary files (in a central file
system), which is called /scratch.
When setting SPARK_LOCAL_DIRS=/scratch/
I get:
An error occurred while calling
z:o
// assuming Spark 1.0
Hi Baoqiang,
In my experience for the standalone cluster you need to set
SPARK_WORKER_DIR not SPARK_LOCAL_DIRS to control where shuffle files are
written. I think this is a documentation issue that could be improved, as
http://spark.apache.org/docs/latest/spark-standalone.h
Hi
I’m trying to using a specific dir for spark working directory since I have
limited space at /tmp. I tried:
1)
export SPARK_LOCAL_DIRS=“/mnt/data/tmp”
or 2)
SPARK_LOCAL_DIRS=“/mnt/data/tmp” in spark-env.sh
But neither worked, since the output of spark still saying
ERROR DiskBlockObjectWrit