Thanks Gary!

Sure, there are issues with updates in S3. You may want to look over
EMRFS guarantees of the consistent view [1]. I'm not sure, is it
possible in non-EMR AWS system or not.

I'm creating a JIRA issue regarding data loss possibility in S3. IMHO,
Flink docs should mention about possible data loss in S3.

[1] 
https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-consistent-view.html

--
Thanks,
Amit

On Fri, May 18, 2018 at 2:48 AM, Gary Yao <g...@data-artisans.com> wrote:
> Hi Amit,
>
> The BucketingSink doesn't have well defined semantics when used with S3.
> Data
> loss is possible but I am not sure whether it is the only problem. There are
> plans to rewrite the BucketingSink in Flink 1.6 to enable eventually
> consistent
> file systems [1][2].
>
> Best,
> Gary
>
>
> [1]
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/sink-with-BucketingSink-to-S3-files-override-td18433.html
> [2] https://issues.apache.org/jira/browse/FLINK-6306
>
> On Thu, May 17, 2018 at 11:57 AM, Amit Jain <aj201...@gmail.com> wrote:
>>
>> Hi,
>>
>> We are using Flink to process click stream data from Kafka and pushing
>> the same in 128MB file in S3.
>>
>> What is the message processing guarantees with S3 sink? In my
>> understanding, S3A client buffers the data on memory/disk. In failure
>> scenario on particular node, TM would not trigger Writer#close hence
>> buffered data can lose entirely assuming this buffer contains data of
>> last successful checkpointing.
>>
>> --
>> Thanks,
>> Amit
>
>

Reply via email to