Hi there,

I recently ran into problems with a Flink job running on an EMR cluster consuming events from a Kinesis stream receiving roughly 15k event/second. Although the EMR cluster was substantially scaled and CPU utilization and system load were well below any alarming threshold, the processing of events of the stream increasingly fell behind.

Eventually, it turned out that the SHARD_GETRECORDS_MAX defaults to 100 which is apparently causing too much overhead when consuming events from the stream. Increasing the value to 5000, a single GetRecords call to Kinesis can retrieve up to 10k records, made the problem go away.

I wonder why the default value for SHARD_GETRECORDS_MAX is chosen so low (100x less than it could be). The Kinesis Client Library defaults to 5000 and it's recommended to use this default value: http://docs.aws.amazon.com/streams/latest/dev/troubleshooting-consumers.html#consumer-app-reading-slower.

Thanks for the clarification!

Cheers,
Steffen

Reply via email to