Hi Mark,

could you double check if these spikes co-occur with checkpointing? If
there is an alignment, certain channels are blocked from taking in data. If
all keys are more or less contained in a shard with less data, it would why
only these keys are affected.

On Mon, Nov 30, 2020 at 9:27 PM Kegel, Mark <mark.ke...@disneystreaming.com>
wrote:

> We have a high volume (600-700 shards) kinesis data stream that we are
> doing a simple keying and aggregation on. The logic is very simple: kinesis
> source, key by fields (A,B,C), window (1-minute, tumbling), aggregate by
> summing over integer field R, connect to sink.
>
>
>
> We are seeing some anomalous spikes in our aggregations. From one minute
> to the next, the sum total for one particular key may increase 25x or more
> and then drop back down to a normal level, yet sums for other keys in the
> same window remain roughly the same, which we expect.
>
>
>
> We don’t see this too often. Maybe 1-5 data points (key + timestamp) in an
> hour’s worth of 1-minute windowed data will have these spikes. The data has
> fairly low cardinality. There are only roughly two hundred distinct keys.
>
>
>
> We inspected the raw kinesis stream and found no duplicates. It isn’t
> clear how these spikes could happen or what we might do to work around the
> issue since the code is as idiomatic as possible.
>
>
>
> We are running the job as part of Kinesis Data Analytics, which is using
> Flink version 1.8. To connect to Kinesis we are using the
> amazon-kinesis-connection-flink library (v1.0.4) library and the EFO
> consumer mode.
>
>
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng

Reply via email to