A central location, such as NFS?

If they are temporary for the purpose of further job processing you'll want
to keep them local to the node in the cluster, i.e., in /tmp. If they are
centralized you won't be able to take advantage of data locality and the
central file store will become a bottleneck for further processing.

If /tmp isn't an option because you want to be able to monitor the file
outputs as they occur you can also use HDFS (assuming your Spark nodes are
also HDFS members they will benefit from data locality).

It looks like the problem you are seeing is that a lock cannot be acquired
on the output file in the central file system.

On Wed Feb 11 2015 at 11:55:55 AM TJ Klein <tjkl...@gmail.com> wrote:

> Hi,
>
> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a different
> path then local directory.
>
> On our cluster we have a folder for temporary files (in a central file
> system), which is called /scratch.
>
> When setting SPARK_LOCAL_DIRS=/scratch/<node name>
>
> I get:
>
>  An error occurred while calling
> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task
> 0
> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage
> 0.0
> (TID 3, XXXXXXX): java.io.IOException: Function not implemented
> at sun.nio.ch.FileDispatcherImpl.lock0(Native Method)
>         at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java:91)
>         at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1022)
>         at java.nio.channels.FileChannel.lock(FileChannel.java:1052)
>         at org.apache.spark.util.Utils$.fetchFile(Utils.scala:379)
>
> Using SPARK_LOCAL_DIRS=/tmp, however, works perfectly. Any idea?
>
> Best,
>  Tassilo
>
>
>
>
>
> --
> View this message in context: http://apache-spark-user-list.
> 1001560.n3.nabble.com/SPARK-LOCAL-DIRS-Issue-tp21602.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to