Hey Gordon,

I have updated the FLIP [1] to include support for configurable registration 
strategies:
- Added 2 additional configuration keys
- Added Registration/De-registration Configuration section
- Updated Stream Consumer Registration/Tear Down section
- Remove rejected alternative (since we are now optionally supporting it)

A slight tweak. I have added a third option, "none". This will disable 
registration/de-registration and allow the user to directly pass in the 
ConsumerARN. This option adds overhead/complexity to the user configuration but 
it will remove all start-up and teardown AWS SDK calls.

Let me know if you are happy to proceed to a vote or have any further feedback.

Thanks,
Danny Cranmer

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-128%3A+Enhanced+Fan+Out+for+AWS+Kinesis+Consumers

On 29/06/2020, 15:28, "Cranmer, Danny" <cranm...@amazon.com.INVALID> wrote:

    Hey Gordon,

    Thank-you for you review and feedback. 

    I agree with your suggestion for the contribution plan. I have updated the 
FLIP to include the additional step with the FanOutKinesisProxy/AWS SDK 2.x 
dependency. I have also added another precursor step to generally improve test 
coverage. I have been playing around in the connector code and discovered that 
restarting consumption from an aggregated record is not covered by 
unit/integration tests. I have written some additional tests with simulated 
Kinesis behaviour, pushing these tests in advance would increase confidence in 
the next contribution step.

    Regarding the consumer de-/registration. I had not considered making it 
optional via configuration as you propose. I agree with your observation on the 
additional configuration complexity and it would also expose internal 
implementation details to the user. That being said, if a user application has 
a very high parallelism they could quite easily exceed the ListStreamConsumers 
quota (5 TPS [1]), and increase the application start-up time substantially 
with back-off delays. The client could conditionally (based on config) register 
the stream consumer and add the ConsumerARN(s) to the consumer properties, 
eliminating the required parallel calls to ListStreamConsumers by the Flink 
tasks. I will update the FLIP to include this change, and reply to this thread 
once it is done.

    [1] 
https://docs.aws.amazon.com/kinesis/latest/APIReference/API_ListStreamConsumers.html

    Thanks,
    Danny,

    On 29/06/2020, 05:19, "Tzu-Li (Gordon) Tai" <tzuli...@apache.org> wrote:

        CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



        Also, if it wasn't clear, I'll be happy to provide committer support on
        reviewing and merging this FLIP, if it gets approved :)

        On Mon, Jun 29, 2020 at 12:12 PM Tzu-Li (Gordon) Tai 
<tzuli...@apache.org>
        wrote:

        > Hi Cranmer,
        >
        > Thank you for proposing the feature and starting the discussion 
thread.
        > This is really great work!
        >
        > Overall, +1 to adding EFO support to the Kinesis connector.
        > I can see that having a dedicated throughput quota for each consuming
        > Flink application is definitely a requirement for AWS users.
        > In the past, we worked around this by using adaptive polling to avoid
        > exceeding the quotas with multiple consumers, this would probably go 
away
        > with this implemented.
        >
        > There are a few things I like about the current proposal:
        > - EFO is an opt-in feature for now. Once we decide to reimplement the
        > Kinesis connector on top of the new source interface (FLIP-27), we can
        > probably consider enabling EFO by default to match the defaults of the
        > higher-level KCL library.
        > - From the design, it seems like the changes can indeed be fairly
        > consolidated in the Kinesis connector. The change should be fairly 
safe as
        > well, since we're essentially abstracting record publishing concerns 
only,
        > which is transparent to the exactly-once semantics / watermark 
components.
        >
        > Concerning competing stream consumer de-/registration:
        > this would most likely go away with the new source interface, where 
this
        > can be done on the source split enumerator.
        > I'm personally okay with the proposed strategy of competing with 
backoff.
        > As food for thought, have you considered an opt-in / opt-out, where 
the
        > user knows that the client has access to KDS, and can choose to 
register
        > once only on the client side instead of competing in TMs?
        > I'm not sure if this is worth the extra configuration complexity 
though.
        >
        > For the concrete next steps for implementation after this FLIP has 
passed:
        > From the FLIP I can conclude that implementation wise, this would 
come in
        > a few steps -
        > 1. Abstract away record publishing behind a new interface. Initially 
we
        > only have one implementation (the current polling mechanism).
        > 2. Add in AWS SDK 2.x dependency, with a new FanOutKinesisProxy
        > implementation.
        > 3. Add FanOutRecordPublisher to finalize EFO support.
        >
        > I think step 2. wasn't explicitly mentioned in the FLIP, but I 
strongly
        > suggest to consolidate that step as a single PR, as we would need to 
do a
        > license check for dependency changes and it would be nice to move 
forward
        > with that with at least interference of code changes as possible.
        >
        > Pushing this FLIP forward for approval:
        > Since this FLIP is a fairly consolidated change, we should be safe to
        > proceed with a vote soon.
        > That usually happens in a separate vote thread, linking to this 
discussion
        > thread.
        >
        > cc'ing Thomas Weise as well, as he has also worked substantially on 
the
        > Kinesis connector in the past.
        >
        > Cheers,
        > Gordon
        >
        > On Tue, Jun 23, 2020 at 6:02 PM hey_wxl <xiaolong.w...@smartnews.com>
        > wrote:
        >
        >> Hi, Cranmer.
        >>
        >>    I'm Roland Wang. I've read the FLIP you wrote, and agree with your
        >> design.
        >>    Recently, I'm working on this feature too, and have made some 
progress:
        >>
        >>    1. I add two methods: getOrRegisterConsumer & subscribeToShard on
        >> KinesisProxyInterface.
        >>    2. I re-implement the KinesisProxy using AWS SDK V2.x.
        >>    3. I use the new KinesisProxy to implement ShardConsumer.
        >>
        >>    Though my design is not fully considered, I hope we can discuss a
        >> little
        >> bit about this feature. I wish to make some contribution to the 
community.
        >>
        >>
        >>
        >>
        >> --
        >> Sent from:
        >> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
        >>
        >


Reply via email to