Just to clarify, the native KCL/KPL aggregation [1] handles the partition key rebalancing for you out of the box.
[1] https://docs.aws.amazon.com/streams/latest/dev/kinesis -kpl-concepts.html#kinesis-kpl-concepts-aggretation On Thu, Apr 14, 2022 at 8:48 AM Danny Cranmer <dannycran...@apache.org> wrote: > Hey Blake, > > I am happy to take a look, but I will not have capacity until next week. > > The current way to achieve multiple records per PUT is to use the native > KCL/KPL aggregation [1], which is supported by the Flink connector. A > downside of aggregation is that the sender has to manage the partitioning > strategy. For example, each record in your list will be sent to the same > shard. If the sender implements grouping of records by partition key, then > care needs to be taken during shard scaling. > > Thanks, > > [1] > https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html#kinesis-kpl-concepts-aggretation > > > On Tue, Apr 12, 2022 at 3:52 AM Blake Wilson <bl...@yellowpapersun.com> > wrote: > >> Hello, I recently submitted a pull request to support the Collector API >> for >> the Kinesis Streams Connector. >> >> The ability to use this API would save a great deal of shuttling bytes >> around in multiple Flink programs I've worked on. This is because to >> construct a stream of the desired type without Collector support, the >> Kinesis source must emit a List[Type], and this must be flattened to a >> Type >> stream. >> >> Because of the way Kinesis pricing works, it rarely makes sense to send >> one >> value per Kinesis record. In provisioned mode, Kinesis PUTs are priced to >> the nearest 25KB (https://aws.amazon.com/kinesis/data-streams/pricing/), >> so >> records are more sensibly packed with multiple values unless these values >> are quite large. Therefore, I suspect the need to handle multiple values >> per Kinesis record is quite common. >> >> The PR is located at https://github.com/apache/flink/pull/19417, and I'd >> love to get some feedback on Github or here. >> >> Thanks! >> >