> On 17 Sep 2015, at 21:40, Tathagata Das <t...@databricks.com> wrote: > > Actually, the current WAL implementation (as of Spark 1.5) does not work with > S3 because S3 does not support flushing. Basically, the current > implementation assumes that after write + flush, the data is immediately > durable, and readable if the system crashes without closing the WAL file. > This does not work with S3 as data is durable only and only if the S3 file > output stream is cleanly closed. >
more precisely, unless you turn multipartition uploads on, the S3n/s3a clients Spark uses *doesn't even upload anything to s3*. It's not a filesystem, and you have to bear that in mind. Amazon's own s3 client used in EMR behaves differently; it may be usable as a destination (I haven't tested) --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org