All Are there any drawbacks or technical challenges (or any information, really) related to using Spark directly on a global parallel filesystem like Lustre/GPFS?
Any idea of what would be involved in doing a minimal proof of concept? Is it just possible to run Spark unmodified (without the HDFS substrate) for a start, or will that not work at all? I do know that it’s possible to implement Tachyon on Lustre and get the HDFS interface – just looking at other options. Venkat