I finally found the time to dig a little more on this and found the real
problem.
The culprit of the slow-down is this piece of code:
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-filesystem/src/main/java/org/apache/flink/streaming/connectors/fs/bucketing/BucketingSi
Starting here the discussion after an initial discussion with Ververica and AWS
teams during FlinkForward.
I'm investigating the performances of a Flink job that transports data from
Kafka to an S3 Sink.
We are using a BucketingSink to write parquet files. The bucketing logic
divides the message
ry.
>
> Out of curiosity, I guess that in the BucketingSink you were using the
> AvroKeyValueSinkWriter, right?
>
> Cheers,
> Kostas
>
> On Fri, Aug 30, 2019 at 10:23 AM Enrico Agnoli
> wrote:
> >
> > StreamingFile limitations
> >
> > Hi
StreamingFile limitations
Hi community,
I'm working toward the porting of our code from `BucketingSink<>` to
`StreamingFileSink`.
In this case we use the sink to write AVRO via Parquet and the suggested
implementation of the Sink should be something like:
```
val parquetWriterFactory = Parquet