Re: Supporting Collector API in Kinesis Connector

Danny Cranmer Thu, 14 Apr 2022 00:48:42 -0700

Hey Blake,

I am happy to take a look, but I will not have capacity until next week.

The current way to achieve multiple records per PUT is to use the native
KCL/KPL aggregation [1], which is supported by the Flink connector. A
downside of aggregation is that the sender has to manage the partitioning
strategy. For example, each record in your list will be sent to the same
shard. If the sender implements grouping of records by partition key, then
care needs to be taken during shard scaling.

Thanks,

[1]
https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html#kinesis-kpl-concepts-aggretation

On Tue, Apr 12, 2022 at 3:52 AM Blake Wilson <bl...@yellowpapersun.com>
wrote:

> Hello, I recently submitted a pull request to support the Collector API for
> the Kinesis Streams Connector.
>
> The ability to use this API would save a great deal of shuttling bytes
> around in multiple Flink programs I've worked on. This is because to
> construct a stream of the desired type without Collector support, the
> Kinesis source must emit a List[Type], and this must be flattened to a Type
> stream.
>
> Because of the way Kinesis pricing works, it rarely makes sense to send one
> value per Kinesis record. In provisioned mode, Kinesis PUTs are priced to
> the nearest 25KB (https://aws.amazon.com/kinesis/data-streams/pricing/),
> so
> records are more sensibly packed with multiple values unless these values
> are quite large. Therefore, I suspect the need to handle multiple values
> per Kinesis record is quite common.
>
> The PR is located at https://github.com/apache/flink/pull/19417, and I'd
> love to get some feedback on Github or here.
>
> Thanks!
>

Re: Supporting Collector API in Kinesis Connector

Reply via email to