A heads up on this front: - For state backends during checkpointing, I would suggest to use the flink-s3-fs-presto, which is quite a bit faster than the flink-s3-fs-hadoop by avoiding a bunch of unnecessary metadata operations.
- We have started work on re-writing the Bucketing Sink to make it work with the shaded S3 filesystems (like flink-s3-fs-presto). We are also adding a more powerful internal abstraction that uses multipart uploads for faster incremental persistence of result chunks on checkpoints. This should be in 1.6, happy to share more as soon as it is out... On Wed, Feb 7, 2018 at 3:56 PM, Marchant, Hayden <hayden.march...@citi.com> wrote: > WE actually got it working. Essentially, it's an implementation of > HadoopFilesytem, and was written with the idea that it can be used with > Spark (since it has broader adoption than Flink as of now). We managed to > get it configured, and found the latency to be much lower than by using the > s3 connector. There are a lot less copying operations etc... happening > under the hood when using this native API which explains the better > performance. > > Happy to provide assistance offline if you're interested. > > Thanks > Hayden > > -----Original Message----- > From: Edward Rojas [mailto:edward.roja...@gmail.com] > Sent: Thursday, February 01, 2018 6:09 PM > To: user@flink.apache.org > Subject: RE: S3 for state backend in Flink 1.4.0 > > Hi Hayden, > > It seems like a good alternative. But I see it's intended to work with > spark, did you manage to get it working with Flink ? > > I some tests but I get several errors when trying to create a file, either > for checkpointing or saving data. > > Thanks in advance, > Regards, > Edward > > > > -- > Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__apache- > 2Dflink-2Duser-2Dmailing-2Dlist-2Darchive.2336050.n4. > nabble.com_&d=DwICAg&c=j-EkbjBYwkAB4f8ZbVn1Fw&r=g- > 5xYRH8L3aCnCNTROw5LrsB5gbTayWjXSm6Nil9x0c&m=MW1NZ-mLVkooOHg- > TWiOE7j2e9PCk7EOAmahXApcLtQ&s=b8kvNKIjylDuKlc2munyBj1da85y8a > Z8brJsO24R2GU&e= >