Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
Thanks. Yes, I think it might not always make sense to lock files, particularly if every executor is getting its own path. On Wed, Feb 11, 2015 at 2:31 PM, Charles Feduke wrote: > And just glancing at the Spark source code around where the stack trace > originates: > > val lockFile = new File(lo

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Charles Feduke
And just glancing at the Spark source code around where the stack trace originates: val lockFile = new File(localDir, lockFileName) val raf = new RandomAccessFile(lockFile, "rw") // Only one executor entry. // The FileLock is only used to control synchronization for executors dow

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
Thanks a lot. I will have a look at it. On Wed, Feb 11, 2015 at 2:20 PM, Charles Feduke wrote: > Take a look at this: > > http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre > > Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf > (linked from that article) > > to get

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Charles Feduke
Take a look at this: http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf (linked from that article) to get a better idea of what your options are. If its possible to avoid writing to [any] disk I'd recommend that rout

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Tassilo Klein
Thanks for the info. The file system in use is a Lustre file system. Best, Tassilo On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke wrote: > A central location, such as NFS? > > If they are temporary for the purpose of further job processing you'll > want to keep them local to the node in the

Re: SPARK_LOCAL_DIRS Issue

2015-02-11 Thread Charles Feduke
A central location, such as NFS? If they are temporary for the purpose of further job processing you'll want to keep them local to the node in the cluster, i.e., in /tmp. If they are centralized you won't be able to take advantage of data locality and the central file store will become a bottlenec

Re: SPARK_LOCAL_DIRS

2014-08-14 Thread Debasish Das
Actually I faced it yesterday... I had to put it in spark-env.sh and take it out from spark-defaults.conf on 1.0.1...Note that this settings should be visible on all workers.. After that I validated that SPARK_LOCAL_DIRS was indeed getting used for shuffling... On Thu, Aug 14, 2014 at 10:27 AM,

Re: SPARK_LOCAL_DIRS option

2014-08-13 Thread Andrew Ash
Hi Deb, If you don't have long-running Spark applications (those taking more than spark.worker.cleanup.appDataTtl) then the TTL-based cleaner is a good solution. If however you have a mix of long-running and short-running applications, then the TTL-based solution will fail. It will clean up data