[ https://issues.apache.org/jira/browse/FLINK-6365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16094156#comment-16094156 ]
Bowen Li commented on FLINK-6365: --------------------------------- Ok. For {{SHARD_GETRECORDS_MAX}}, {{10,000}} it is, since we all agree to the value. We tested it in our prod environment, and it works well by greatly reducing # requests to Kinesis. For {{SHARD_GETRECORDS_INTERVAL}}, I second [~sthm]'s proposal. Practically, I set that value of our prod Flink job to be 2,000ms (yes, 2sec), because 0ms exploded our 36-shards kinesis stream and setting {{SHARD_GETRECORDS_MAX}} as 10,000 makes up for the longer interval. I'm also evaluating it theoretically for its relationship to {{# parallelism of Flink datasource stream}} (1) and {{# shards in kinesis stream}} (2). * When (1) = (2), 1 parallel Flink source operation reads from exactly 1 kinesis shard. So 200ms is much better than 0ms, because 200ms makes Flink source read at max speed without exceeding read capacity. * When (1) > (2), some (or all) kinesis shards are read by more than 1 parallel Flink source. 200ms is still better than 0ms, because a) 200ms guarantees a shard receives at least 5requests/sec if that shard is read by 1 Flink source, and b) 200ms can greatly lower # requests if that shard is read by more than 1 Flink source, and lower Flink's read latency * When (1) < (2), some (or all) Flink sources read from more than 1 kinesis shard. 200ms probably cannot unleash some shards' potential, and a shorter time seems more reasonable. However, 0ms is still too intense. In short, 200ms at least makes Flink work, and 0ms is not. Besides, given that Steffen works for AWS, I put more weight on his opinion. > Adapt default values of the Kinesis connector > --------------------------------------------- > > Key: FLINK-6365 > URL: https://issues.apache.org/jira/browse/FLINK-6365 > Project: Flink > Issue Type: Improvement > Components: Kinesis Connector > Affects Versions: 1.2.0 > Reporter: Steffen Hausmann > Assignee: Bowen Li > Priority: Minor > Fix For: 1.4.0, 1.3.2 > > > As discussed in > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kinesis-connector-SHARD-GETRECORDS-MAX-default-value-td12332.html, > it seems reasonable to change the default values of the Kinesis connector to > follow KCL’s default settings. I suggest to adapt at least the values for > SHARD_GETRECORDS_MAX and SHARD_GETRECORDS_INTERVAL_MILLIS. > As a Kinesis shard is currently limited to 5 get operations per second, you > can observe high ReadProvisionedThroughputExceeded rates with the current > default value for SHARD_GETRECORDS_INTERVAL_MILLIS of 0; it seem reasonable > to increase it to 200. As it's described in the email thread, it seems > furthermore desirable to increase the default value for SHARD_GETRECORDS_MAX > to 10000. > The values that are used by the KCL can be found here: > https://github.com/awslabs/amazon-kinesis-client/blob/master/src/main/java/com/amazonaws/services/kinesis/clientlibrary/lib/worker/KinesisClientLibConfiguration.java > Thanks for looking into this! > Steffen -- This message was sent by Atlassian JIRA (v6.4.14#64029)