Re: Anomalous spikes in aggregations of keyed data

2020-11-30 Thread Kegel, Mark
At the moment we checkpoint every minute. I can turn this frequency down but I’m not sure that will fix/hide the issue. Mark From: Arvid Heise Date: Monday, November 30, 2020 at 2:33 PM To: Kegel, Mark Cc: user@flink.apache.org Subject: Re: Anomalous spikes in aggregations of keyed data Hi

Re: Anomalous spikes in aggregations of keyed data

2020-11-30 Thread Arvid Heise
Hi Mark, could you double check if these spikes co-occur with checkpointing? If there is an alignment, certain channels are blocked from taking in data. If all keys are more or less contained in a shard with less data, it would why only these keys are affected. On Mon, Nov 30, 2020 at 9:27 PM Keg

Anomalous spikes in aggregations of keyed data

2020-11-30 Thread Kegel, Mark
We have a high volume (600-700 shards) kinesis data stream that we are doing a simple keying and aggregation on. The logic is very simple: kinesis source, key by fields (A,B,C), window (1-minute, tumbling), aggregate by summing over integer field R, connect to sink. We are seeing some anomalous