Thanks for your feedback Till. I think in this scenario the best approach
is to go into the ThreadPool.

On Fri, Apr 3, 2020 at 1:47 PM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi David,
>
> I assume that you have written your own TwoPhaseCommitSink which writes to
> S3, right? If that is the case, then it is mainly up to your implementation
> how it writes files to S3. If your S3 client supports uploading multiple
> files concurrently, then you should go for it.
>
> Async I/O won't help you much in this scenario if you have strict exactly
> once guarantees. If you can tolerate at least once guarantees, then you
> could try to build an async operator which writes files to S3. But you
> could do the same in your custom TwoPhaseCommitSink implementations by
> spawning a ThreadPool and submitting multiple write operations.
>
> Cheers,
> Till
>
> On Fri, Apr 3, 2020 at 2:21 PM David Magalhães <speeddra...@gmail.com>
> wrote:
>
>> I have a scenario where multiple small files need to be written on S3.
>> I'm using TwoPhaseCommit sink since I have a specific scenario where I
>> can't use StreamingFileSink.
>>
>> I've notice that because the way the S3 write is done (sequencially), the
>> checkpoint is timining out (10 minutes), because it takes too much time to
>> write multiple files in S3. I've search for a bit and found this
>> documentation,
>> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html
>>
>> Should this be the best way to try to write multiple files in S3 to not
>> wait for one file to be completed, in order to write the next one ?
>>
>> Thanks!
>>
>

Reply via email to