Thanks for your feedback Till. I think in this scenario the best approach is to go into the ThreadPool.
On Fri, Apr 3, 2020 at 1:47 PM Till Rohrmann <trohrm...@apache.org> wrote: > Hi David, > > I assume that you have written your own TwoPhaseCommitSink which writes to > S3, right? If that is the case, then it is mainly up to your implementation > how it writes files to S3. If your S3 client supports uploading multiple > files concurrently, then you should go for it. > > Async I/O won't help you much in this scenario if you have strict exactly > once guarantees. If you can tolerate at least once guarantees, then you > could try to build an async operator which writes files to S3. But you > could do the same in your custom TwoPhaseCommitSink implementations by > spawning a ThreadPool and submitting multiple write operations. > > Cheers, > Till > > On Fri, Apr 3, 2020 at 2:21 PM David Magalhães <speeddra...@gmail.com> > wrote: > >> I have a scenario where multiple small files need to be written on S3. >> I'm using TwoPhaseCommit sink since I have a specific scenario where I >> can't use StreamingFileSink. >> >> I've notice that because the way the S3 write is done (sequencially), the >> checkpoint is timining out (10 minutes), because it takes too much time to >> write multiple files in S3. I've search for a bit and found this >> documentation, >> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/operators/asyncio.html >> >> Should this be the best way to try to write multiple files in S3 to not >> wait for one file to be completed, in order to write the next one ? >> >> Thanks! >> >