Re: [DISCUSS] Pluggable Batching for Async Sink in Flink

Ahmed Hamdy Wed, 12 Feb 2025 03:21:07 -0800

Hi Poorvank,
Thanks for driving this, +1 (non-binding) for the FLIP in general. While I
see some common grounds with FLIP-284 that I wrote, I prefer going with a
new FLIP since it is driven by a current use case and is specific for
tackling it. I have a couple of clarifying questions.

1- I assume we will not expose batch creators to users but only to sink
implementers, am I correct?

2- It is nit: but I would prefer adding the creator to the
`AsyncSinkWriterConfiguration` rather than overloading the constructor,
there are some already implemented ways for instantiating the configs with
defaults so would be simpler.

3- I want to leverage some concept from FLIP-284 that we have previously
seen in DynamoDbAsyncSink, do you think we can also customize the trigger
so we can also add a "readyToFlush()" method or something similar and use
it in `nonBlockingFlush` methods instead of forcing the trigger on only
total record count or buffer size. I see this useful in your case as well
because if we have buffered 100 requests from 100 different partitions then
we will keep triggering and emitting batches of 1 for 100 times (unless
maxTimeInBufferMS is used ofc) so you can use the custom trigger for
minimum buffered records per partition. This might be a bit tricky because
in order to not affect the performance we probably will wrap the whole
buffer in the batchCreator to maintain calculations like size in bytes or
per partition record count in the batch creator making it more
"bufferHandler" rather than a "batchCreator", let me know your thoughts
about this.

Best Regards
Ahmed Hamdy

On Tue, 11 Feb 2025 at 10:45, Poorvank Bhatia <puravbhat...@gmail.com>
wrote:

> Hey everyone,
>
> I’d like to propose adding a pluggable batching mechanism to
> AsyncSinkWriter
> <
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L351
> >
>  to enable custom batch formation strategies.
> Currently, batching is based on batch size and record count, but this
> approach is suboptimal for sinks like Cassandra, which require
> partition-aware batching. Specifically, batches should be formed so that
> all requests within a batch belong to the same partition, ensuring more
> efficient writes.
>
> The proposal introduces a minimal `BatchCreator` interface, enabling users
> to define custom batching strategies while maintaining backward
> compatibility with a default implementation.
>
> For full details, please refer to the proposal document
> <
> https://docs.google.com/document/d/1XI2DV-8r-kOwbMd2ZMdV4u0Q_s5m8Stojv4HSdJA8ZU/edit?tab=t.0#heading=h.n4fv4r64xk2f
> >
> .
> Associated Jira <https://issues.apache.org/jira/browse/FLINK-37298>
>
> Thanks,
> Poorvank
>

Re: [DISCUSS] Pluggable Batching for Async Sink in Flink

Reply via email to