Re: WAL on S3

Michal Čizmazia Tue, 22 Sep 2015 16:54:35 -0700

I am trying to use pluggable WAL, but it can be used only with
checkpointing turned on. Thus I still need have a Hadoop-compatible file
system.


Is there something like pluggable checkpointing?

Or can WAL be used without checkpointing? What happens when WAL is
available but the checkpoint directory is lost?

Thanks!


On 18 September 2015 at 05:47, Tathagata Das <t...@databricks.com> wrote:

> I dont think it would work with multipart upload either. The file is not
> visible until the multipart download is explicitly closed. So even if each
> write a part upload, all the parts are not visible until the multiple
> download is closed.
>
> TD
>
> On Fri, Sep 18, 2015 at 1:55 AM, Steve Loughran <ste...@hortonworks.com>
> wrote:
>
>>
>> > On 17 Sep 2015, at 21:40, Tathagata Das <t...@databricks.com> wrote:
>> >
>> > Actually, the current WAL implementation (as of Spark 1.5) does not
>> work with S3 because S3 does not support flushing. Basically, the current
>> implementation assumes that after write + flush, the data is immediately
>> durable, and readable if the system crashes without closing the WAL file.
>> This does not work with S3 as data is durable only and only if the S3 file
>> output stream is cleanly closed.
>> >
>>
>>
>> more precisely, unless you turn multipartition uploads on, the S3n/s3a
>> clients Spark uses *doesn't even upload anything to s3*.
>>
>> It's not a filesystem, and you have to bear that in mind.
>>
>> Amazon's own s3 client used in EMR behaves differently; it may be usable
>> as a destination (I haven't tested)
>>
>>
>

Re: WAL on S3

Reply via email to