Thanks a lot. I will have a look at it. On Wed, Feb 11, 2015 at 2:20 PM, Charles Feduke <charles.fed...@gmail.com> wrote:
> Take a look at this: > > http://wiki.lustre.org/index.php/Running_Hadoop_with_Lustre > > Particularly: http://wiki.lustre.org/images/1/1b/Hadoop_wp_v0.4.2.pdf > (linked from that article) > > to get a better idea of what your options are. > > If its possible to avoid writing to [any] disk I'd recommend that route, > since that's the performance advantage Spark has over vanilla Hadoop. > > On Wed Feb 11 2015 at 2:10:36 PM Tassilo Klein <tjkl...@gmail.com> wrote: > >> Thanks for the info. The file system in use is a Lustre file system. >> >> Best, >> Tassilo >> >> On Wed, Feb 11, 2015 at 12:15 PM, Charles Feduke < >> charles.fed...@gmail.com> wrote: >> >>> A central location, such as NFS? >>> >>> If they are temporary for the purpose of further job processing you'll >>> want to keep them local to the node in the cluster, i.e., in /tmp. If they >>> are centralized you won't be able to take advantage of data locality and >>> the central file store will become a bottleneck for further processing. >>> >>> If /tmp isn't an option because you want to be able to monitor the file >>> outputs as they occur you can also use HDFS (assuming your Spark nodes are >>> also HDFS members they will benefit from data locality). >>> >>> It looks like the problem you are seeing is that a lock cannot be >>> acquired on the output file in the central file system. >>> >>> On Wed Feb 11 2015 at 11:55:55 AM TJ Klein <tjkl...@gmail.com> wrote: >>> >>>> Hi, >>>> >>>> Using Spark 1.2 I ran into issued setting SPARK_LOCAL_DIRS to a >>>> different >>>> path then local directory. >>>> >>>> On our cluster we have a folder for temporary files (in a central file >>>> system), which is called /scratch. >>>> >>>> When setting SPARK_LOCAL_DIRS=/scratch/<node name> >>>> >>>> I get: >>>> >>>> An error occurred while calling >>>> z:org.apache.spark.api.python.PythonRDD.newAPIHadoopFile. >>>> : org.apache.spark.SparkException: Job aborted due to stage failure: >>>> Task 0 >>>> in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in >>>> stage 0.0 >>>> (TID 3, XXXXXXX): java.io.IOException: Function not implemented >>>> at sun.nio.ch.FileDispatcherImpl.lock0(Native Method) >>>> at sun.nio.ch.FileDispatcherImpl.lock(FileDispatcherImpl.java: >>>> 91) >>>> at sun.nio.ch.FileChannelImpl.lock(FileChannelImpl.java:1022) >>>> at java.nio.channels.FileChannel.lock(FileChannel.java:1052) >>>> at org.apache.spark.util.Utils$.fetchFile(Utils.scala:379) >>>> >>>> Using SPARK_LOCAL_DIRS=/tmp, however, works perfectly. Any idea? >>>> >>>> Best, >>>> Tassilo >>>> >>>> >>>> >>>> >>>> >>>> -- >>>> View this message in context: http://apache-spark-user-list. >>>> 1001560.n3.nabble.com/SPARK-LOCAL-DIRS-Issue-tp21602.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>