Hi Till, thank you for your reply.

> What you can do, though, is to create a custom operator or use a flatMap
to build your own windowing operator.
Since my stream wouldn't be keyed, does this mean that I would need to use
"Managed Operator State" (aka raw state)?

On Thu, Apr 29, 2021 at 10:34 AM Till Rohrmann <trohrm...@apache.org> wrote:

> Hi Yegor,
>
> If you want to use Flink's keyed windowing logic, then you need to insert
> a keyBy/shuffle operation because Flink currently cannot simply use the
> partitioning of the Kinesis shards. The reason is that Flink needs to group
> the keys into the correct key groups in order to support rescaling of the
> state.
>
> What you can do, though, is to create a custom operator or use a flatMap
> to build your own windowing operator. This operator could then use the
> partitioning of the Kinesis shards by simply collecting the events until
> either 30 seconds or 1000 events are observed.
>
> Cheers,
> Till
>
> On Wed, Apr 28, 2021 at 11:12 AM Yegor Roganov <yegor....@gmail.com>
> wrote:
>
>> Hello
>>
>> To learn Flink I'm trying to build a simple application where I want to
>> save events coming from Kinesis to S3.
>> I want to subscribe to each shard, and within each shard I want to batch
>> for 30 seconds, or until 1000 events are observed. These batches should
>> then be uploaded to S3.
>> What I don't understand is how to key my source on shard id, and do it in
>> a way that doesn't induce unnecessary shuffling.
>> Is this possible with Flink?
>>
>

Reply via email to