1. Currently, the WAL can be used only with checkpointing turned on,
because it does not make sense to recover from WAL if there is not
checkpoint information to recover from.

2. Since the current implementation saves the WAL in the checkpoint
directory, they share the fate -- if checkpoint directory is deleted, then
both checkpoint info and WAL info is deleted.

3. Checkpointing is currently not pluggable. Why do do you want that?



On Tue, Sep 22, 2015 at 4:53 PM, Michal Čizmazia <mici...@gmail.com> wrote:

> I am trying to use pluggable WAL, but it can be used only with
> checkpointing turned on. Thus I still need have a Hadoop-compatible file
> system.
>
> Is there something like pluggable checkpointing?
>
> Or can WAL be used without checkpointing? What happens when WAL is
> available but the checkpoint directory is lost?
>
> Thanks!
>
>
> On 18 September 2015 at 05:47, Tathagata Das <t...@databricks.com> wrote:
>
>> I dont think it would work with multipart upload either. The file is not
>> visible until the multipart download is explicitly closed. So even if each
>> write a part upload, all the parts are not visible until the multiple
>> download is closed.
>>
>> TD
>>
>> On Fri, Sep 18, 2015 at 1:55 AM, Steve Loughran <ste...@hortonworks.com>
>> wrote:
>>
>>>
>>> > On 17 Sep 2015, at 21:40, Tathagata Das <t...@databricks.com> wrote:
>>> >
>>> > Actually, the current WAL implementation (as of Spark 1.5) does not
>>> work with S3 because S3 does not support flushing. Basically, the current
>>> implementation assumes that after write + flush, the data is immediately
>>> durable, and readable if the system crashes without closing the WAL file.
>>> This does not work with S3 as data is durable only and only if the S3 file
>>> output stream is cleanly closed.
>>> >
>>>
>>>
>>> more precisely, unless you turn multipartition uploads on, the S3n/s3a
>>> clients Spark uses *doesn't even upload anything to s3*.
>>>
>>> It's not a filesystem, and you have to bear that in mind.
>>>
>>> Amazon's own s3 client used in EMR behaves differently; it may be usable
>>> as a destination (I haven't tested)
>>>
>>>
>>
>

Reply via email to