Streaming Files to S3

Li Peng Mon, 25 Nov 2019 18:00:23 -0800

Hey folks, I'm trying to stream large volume data and write them as csv
files to S3, and one of the restrictions is to try and keep the files to
below 100MB (compressed) and write one file per minute. I wanted to verify
with you guys regarding my understanding of StreamingFileSink:


1. From the docs, StreamingFileSink will use multipart upload with s3, so
even with many workers writing to s3, it will still output only one file
for all of them for each time window, right?
2. StreamingFileSink.forRowFormat can be configured to write individual
rows and then commit to disk as per the above rules, by specifying a
RollingPolicy with the file size limit and the rollover interval, correct?
And the limit and the interval applies to the entire file, not to each part
file?
3. To write to s3, is it enough to just add flink-s3-fs-hadoop as a
dependency and specify the file path as "s3://file"?

Thanks,
Li

Streaming Files to S3

Reply via email to