FYI, A colleague in the Spark team sat down to address the long-standing and neglected "Add some Abortable.abort() interface for streams etc which can be terminated"
https://issues.apache.org/jira/browse/HADOOP-16906 PR: https://github.com/apache/hadoop/pull/2667 With markdown to go with and some tuning of the API/S3A implementation. https://github.com/apache/hadoop/pull/2684 We're happy with this -as well as working in the s3a stream it should work with any object store whose output is becomes visible after close(). Obviously this excludes HDFS, file:// and the azure stores. Anything where create() creates the file, hflush flushes to it etc. For spark and similar, this will enable checkpointing direct to s3 or any other store whose stream implements the same interface. You don't need to write to a temp location, you can write to final destination, over the existing data, but choose whether to actually complete or abort the write Comments welcome -Steve