Thanks. Yes, I think it might not always make sense to lock files,
particularly if every executor is getting its own path.
On Wed, Feb 11, 2015 at 2:31 PM, Charles Feduke
wrote:
> And just glancing at the Spark source code around where the stack trace
> originates:
>
> val lockFile = new File(lo
And just glancing at the Spark source code around where the stack trace
originates:
val lockFile = new File(localDir, lockFileName)
val raf = new RandomAccessFile(lockFile, "rw")
// Only one executor entry.
// The FileLock is only used to control synchronization for executors
dow
Thanks a lot. I will have a look at it.
On Wed, Feb 11, 2015 at 2:20 PM, Charles Feduke
wrote:
> Take a look at this:
>
> http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre
>
> Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf
> (linked from that article)
>
> to get
Take a look at this:
http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre
Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf
(linked from that article)
to get a better idea of what your options are.
If its possible to avoid writing to [any] disk I'd recommend that rout
Thanks for the info. The file system in use is a Lustre file system.
Best,
Tassilo
On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke
wrote:
> A central location, such as NFS?
>
> If they are temporary for the purpose of further job processing you'll
> want to keep them local to the node in the
A central location, such as NFS?
If they are temporary for the purpose of further job processing you'll want
to keep them local to the node in the cluster, i.e., in /tmp. If they are
centralized you won't be able to take advantage of data locality and the
central file store will become a bottlenec
Actually I faced it yesterday...
I had to put it in spark-env.sh and take it out from spark-defaults.conf on
1.0.1...Note that this settings should be visible on all workers..
After that I validated that SPARK_LOCAL_DIRS was indeed getting used for
shuffling...
On Thu, Aug 14, 2014 at 10:27 AM,
Hi Deb,
If you don't have long-running Spark applications (those taking more than
spark.worker.cleanup.appDataTtl) then the TTL-based cleaner is a good
solution. If however you have a mix of long-running and short-running
applications, then the TTL-based solution will fail. It will clean up data