Hi Poorvank, Thanks for driving this, +1 (non-binding) for the FLIP in general. While I see some common grounds with FLIP-284 that I wrote, I prefer going with a new FLIP since it is driven by a current use case and is specific for tackling it. I have a couple of clarifying questions.
1- I assume we will not expose batch creators to users but only to sink implementers, am I correct? 2- It is nit: but I would prefer adding the creator to the `AsyncSinkWriterConfiguration` rather than overloading the constructor, there are some already implemented ways for instantiating the configs with defaults so would be simpler. 3- I want to leverage some concept from FLIP-284 that we have previously seen in DynamoDbAsyncSink, do you think we can also customize the trigger so we can also add a "readyToFlush()" method or something similar and use it in `nonBlockingFlush` methods instead of forcing the trigger on only total record count or buffer size. I see this useful in your case as well because if we have buffered 100 requests from 100 different partitions then we will keep triggering and emitting batches of 1 for 100 times (unless maxTimeInBufferMS is used ofc) so you can use the custom trigger for minimum buffered records per partition. This might be a bit tricky because in order to not affect the performance we probably will wrap the whole buffer in the batchCreator to maintain calculations like size in bytes or per partition record count in the batch creator making it more "bufferHandler" rather than a "batchCreator", let me know your thoughts about this. Best Regards Ahmed Hamdy On Tue, 11 Feb 2025 at 10:45, Poorvank Bhatia <puravbhat...@gmail.com> wrote: > Hey everyone, > > I’d like to propose adding a pluggable batching mechanism to > AsyncSinkWriter > < > https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-base/src/main/java/org/apache/flink/connector/base/sink/writer/AsyncSinkWriter.java#L351 > > > to enable custom batch formation strategies. > Currently, batching is based on batch size and record count, but this > approach is suboptimal for sinks like Cassandra, which require > partition-aware batching. Specifically, batches should be formed so that > all requests within a batch belong to the same partition, ensuring more > efficient writes. > > The proposal introduces a minimal `BatchCreator` interface, enabling users > to define custom batching strategies while maintaining backward > compatibility with a default implementation. > > For full details, please refer to the proposal document > < > https://docs.google.com/document/d/1XI2DV-8r-kOwbMd2ZMdV4u0Q_s5m8Stojv4HSdJA8ZU/edit?tab=t.0#heading=h.n4fv4r64xk2f > > > . > Associated Jira <https://issues.apache.org/jira/browse/FLINK-37298> > > Thanks, > Poorvank >