As long as the filesystem is mounted at the same path on every node, you should be able to just run Spark and use a file:// URL for your files.
The only downside with running it this way is that Lustre won’t expose data locality info to Spark, the way HDFS does. That may not matter if it’s a network-mounted file system though. Matei On Apr 4, 2014, at 4:56 PM, Venkat Krishnamurthy <ven...@yarcdata.com> wrote: > All > > Are there any drawbacks or technical challenges (or any information, really) > related to using Spark directly on a global parallel filesystem like > Lustre/GPFS? > > Any idea of what would be involved in doing a minimal proof of concept? Is it > just possible to run Spark unmodified (without the HDFS substrate) for a > start, or will that not work at all? I do know that it’s possible to > implement Tachyon on Lustre and get the HDFS interface – just looking at > other options. > > Venkat