We recently released an object store connector for Spark. https://github.com/SparkTC/stocator Currently this connector contains driver for the Swift based object store ( like SoftLayer or any other Swift cluster ), but it can easily support additional object stores. There is a pending patch to support Amazon S3 object store.
The major highlight is that this connector doesn't create any temporary files and so it achieves very fast response times when Spark persist data in the object store. The new connector supports speculate mode and covers various failure scenarios ( like two Spark tasks writing into same object, partial corrupted data due to run time exceptions in Spark master, etc ). It also covers https://issues.apache.org/jira/browse/SPARK-10063 and other known issues. The detail algorithm for fault tolerance will be released very soon. For now, those who interested, can view the implementation in the code itself. https://github.com/SparkTC/stocator contains all the details how to setup and use with Spark. A series of tests showed that the new connector obtains 70% improvements for write operations from Spark to Swift and about 30% improvements for read operations from Swift into Spark ( comparing to the existing driver that Spark uses to integrate with objects stored in Swift). There is an ongoing work to add more coverage and fix some known bugs / limitations. All the best Gil