Thanks for offering to review, Danny.

Thanks also for pointing out that KCL can de-aggregate records aggregated
by KPL. Several applications I've worked on batch multiple records without
using the KPL unfortunately.

Is de-aggregation supported by the Kinesis Connector Source? I found
mention of aggregation only in the FlinkKinesisProducer when searching
online for this feature.

On Thu, Apr 14, 2022 at 12:51 AM Danny Cranmer <dannycran...@apache.org>
wrote:

> Just to clarify, the native KCL/KPL aggregation [1] handles the partition
> key rebalancing for you out of the box.
>
>
> [1] https://docs.aws.amazon.com/streams/latest/dev/kinesis
> -kpl-concepts.html#kinesis-kpl-concepts-aggretation
>
> On Thu, Apr 14, 2022 at 8:48 AM Danny Cranmer <dannycran...@apache.org>
> wrote:
>
> > Hey Blake,
> >
> > I am happy to take a look, but I will not have capacity until next week.
> >
> > The current way to achieve multiple records per PUT is to use the native
> > KCL/KPL aggregation [1], which is supported by the Flink connector. A
> > downside of aggregation is that the sender has to manage the partitioning
> > strategy. For example, each record in your list will be sent to the same
> > shard. If the sender implements grouping of records by partition key,
> then
> > care needs to be taken during shard scaling.
> >
> > Thanks,
> >
> > [1]
> >
> https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html#kinesis-kpl-concepts-aggretation
> >
> >
> > On Tue, Apr 12, 2022 at 3:52 AM Blake Wilson <bl...@yellowpapersun.com>
> > wrote:
> >
> >> Hello, I recently submitted a pull request to support the Collector API
> >> for
> >> the Kinesis Streams Connector.
> >>
> >> The ability to use this API would save a great deal of shuttling bytes
> >> around in multiple Flink programs I've worked on. This is because to
> >> construct a stream of the desired type without Collector support, the
> >> Kinesis source must emit a List[Type], and this must be flattened to a
> >> Type
> >> stream.
> >>
> >> Because of the way Kinesis pricing works, it rarely makes sense to send
> >> one
> >> value per Kinesis record. In provisioned mode, Kinesis PUTs are priced
> to
> >> the nearest 25KB (https://aws.amazon.com/kinesis/data-streams/pricing/
> ),
> >> so
> >> records are more sensibly packed with multiple values unless these
> values
> >> are quite large. Therefore, I suspect the need to handle multiple values
> >> per Kinesis record is quite common.
> >>
> >> The PR is located at https://github.com/apache/flink/pull/19417, and
> I'd
> >> love to get some feedback on Github or here.
> >>
> >> Thanks!
> >>
> >
>

Reply via email to