As long as the filesystem is mounted at the same path on every node, you should 
be able to just run Spark and use a file:// URL for your files.

The only downside with running it this way is that Lustre won’t expose data 
locality info to Spark, the way HDFS does. That may not matter if it’s a 
network-mounted file system though.

Matei

On Apr 4, 2014, at 4:56 PM, Venkat Krishnamurthy <ven...@yarcdata.com> wrote:

> All
> 
> Are there any drawbacks or technical challenges (or any information, really) 
> related to using Spark directly on a global parallel filesystem  like 
> Lustre/GPFS? 
> 
> Any idea of what would be involved in doing a minimal proof of concept? Is it 
> just possible to run Spark unmodified (without the HDFS substrate) for a 
> start, or will that not work at all? I do know that it’s possible to 
> implement Tachyon on Lustre and get the HDFS interface – just looking at 
> other options.
> 
> Venkat

Reply via email to