Thanks Gary!
Sure, there are issues with updates in S3. You may want to look over
EMRFS guarantees of the consistent view [1]. I'm not sure, is it
possible in non-EMR AWS system or not.
I'm creating a JIRA issue regarding data loss possibility in S3. IMHO,
Flink docs should mention about possible
Hi Amit,
The BucketingSink doesn't have well defined semantics when used with S3.
Data
loss is possible but I am not sure whether it is the only problem. There are
plans to rewrite the BucketingSink in Flink 1.6 to enable eventually
consistent
file systems [1][2].
Best,
Gary
[1]
http://apache-f
Hi Rong,
We are using BucketingSink only. I'm looking for the case where TM
does not get the chance to call Writer#flush like YARN killed the TM
because of OOM. We have configured fs.s3.impl to
com.amazon.ws.emr.hadoop.fs.EmrFileSystem in core-site.xml, so
BucketingSink is using S3 client internal
Hi Amit,
Can you elaborate how you write using "S3 sink" and which version of Flink
you are using?
If you are using BucketingSink[1], you can checkout the API doc and
configure to flush before closing your sink.
This way your sink is "integrated with the checkpointing mechanism to
provide exactly
Hi,
We are using Flink to process click stream data from Kafka and pushing
the same in 128MB file in S3.
What is the message processing guarantees with S3 sink? In my
understanding, S3A client buffers the data on memory/disk. In failure
scenario on particular node, TM would not trigger Writer#clo