Hi Li, This is the expected behavior. All the "exactly-once" sinks in Flink require checkpointing to be enabled. We will update the documentation to be clearer in the upcoming release.
Thanks a lot, Kostas On Sat, Dec 7, 2019 at 3:47 AM Li Peng <li.p...@doordash.com> wrote: > > Ok I seem to have solved the issue by enabling checkpointing. Based on the > docs (I'm using 1.9.0), it seemed like only StreamingFileSink.forBulkFormat() > should've required checkpointing, but based on this experience, > StreamingFileSink.forRowFormat() requires it too! Is this the intended > behavior? If so, the docs should probably be updated. > > Thanks, > Li > > On Fri, Dec 6, 2019 at 2:01 PM Li Peng <li.p...@doordash.com> wrote: >> >> Hey folks, I'm trying to get StreamingFileSink to write to s3 every minute, >> with flink-s3-fs-hadoop, and based on the default rolling policy, which is >> configured to "roll" every 60 seconds, I thought that would be automatic (I >> interpreted rolling to mean actually close a multipart upload to s3). >> >> But I'm not actually seeing files written to s3 at all, instead I see a >> bunch of open multipart uploads when I check the AWS s3 console, for example: >> >> "Uploads": [ >> { >> "Initiated": "2019-12-06T20:57:47.000Z", >> "Key": "2019-12-06--20/part-0-0" >> }, >> { >> "Initiated": "2019-12-06T20:57:47.000Z", >> "Key": "2019-12-06--20/part-1-0" >> }, >> { >> "Initiated": "2019-12-06T21:03:12.000Z", >> "Key": "2019-12-06--21/part-0-1" >> }, >> { >> "Initiated": "2019-12-06T21:04:15.000Z", >> "Key": "2019-12-06--21/part-0-2" >> }, >> { >> "Initiated": "2019-12-06T21:22:23.000Z" >> "Key": "2019-12-06--21/part-0-3" >> } >> ] >> >> And these uploads are being open for a long time. So far after an hour, none >> of the uploads have been closed. Is this the expected behavior? If I wanted >> to get these uploads to actually write to s3 quickly, do I need to configure >> the hadoop stuff to get that done, like setting a smaller buffer/partition >> size to force it to upload? >> >> Thanks, >> Li