Re: WAL on S3

Steve Loughran Fri, 18 Sep 2015 01:55:51 -0700

> On 17 Sep 2015, at 21:40, Tathagata Das <t...@databricks.com> wrote:
> 
> Actually, the current WAL implementation (as of Spark 1.5) does not work with 
> S3 because S3 does not support flushing. Basically, the current 
> implementation assumes that after write + flush, the data is immediately 
> durable, and readable if the system crashes without closing the WAL file. 
> This does not work with S3 as data is durable only and only if the S3 file 
> output stream is cleanly closed. 
>



more precisely, unless you turn multipartition uploads on, the S3n/s3a clients 
Spark uses *doesn't even upload anything to s3*.

It's not a filesystem, and you have to bear that in mind.

Amazon's own s3 client used in EMR behaves differently; it may be usable as a 
destination (I haven't tested)


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: WAL on S3

Reply via email to