Regarding the two hooks you would like to be available: Provide hook to override discovery (not to hit Kinesis from every subtask) Yes, I think we can easily provide a way, for example setting -1 for SHARD_DISCOVERY_INTERVAL_MILLIS, to disable shard discovery. Though, the user would then have to savepoint and restore in order to pick up new shards after a Kinesis stream reshard (which is in practice the best way to by-pass the Kinesis API rate limitations). +1 to provide that.
Provide hook to support custom watermark generation (somewhere around KinesisDataFetcher.emitRecordAndUpdateState) Per-partition watermark generation on the Kinesis side is slightly more complex than Kafka, due to how Kinesis’s dynamic resharding works. I think we need to additionally allow new shards to be consumed only after its parent shard is fully read, otherwise “per-shard time characteristics” can be broken because of this out-of-orderness consumption across the boundaries of a closed parent shard and its child. There theses JIRAs [1][2] which has a bit more details on the topic. Otherwise, in general I’m also +1 to providing this also in the Kinesis consumer. Cheers, Gordon [1] https://issues.apache.org/jira/browse/FLINK-5697 [2] https://issues.apache.org/jira/browse/FLINK-6349 On 8 February 2018 at 1:48:23 AM, Thomas Weise (t...@apache.org) wrote: Provide hook to override discovery (not to hit Kinesis from every subtask) Provide hook to support custom watermark generation (somewhere around KinesisDataFetcher.emitRecordAndUpdateState)