I like the single repo with single version idea.

Pros:
- Better discoverability for connectors for AWS services means a better 
experience for Flink users
- Natural placement of AWS-related utils (Credentials, SDK Retry strategy)

Caveats:
- As you mentioned, it is not desirable if we have to evolve the major version 
of the connector just for a change in a single connector (e.g. DynamoDB). 
However, I think it is reasonable to only evolve the major version of the AWS 
connector repo when there are Flink Source/Sink API upgrades or AWS SDK major 
upgrades (probably quire rare). Any new features for individual connectors can 
be collapsed into minor releases.
- An additional callout here is that we should be careful adopting any AWS 
connectors that don't use the AWS SDK directly (e.g. how the Kinesis connector 
used KPL for a long time). In my opinion, any new connectors like that would be 
better placed in their own repositories, otherwise we will have a complex mesh 
of dependencies to manage.

Regards,
Hong




On 21/10/2022, 16:59, "Danny Cranmer" <dannycran...@apache.org> wrote:

    CAUTION: This email originated from outside of the organization. Do not 
click links or open attachments unless you can confirm the sender and know the 
content is safe.



    Thanks Chesnay for the suggestion, I will investigate this option.

    Related to the single repo idea, I have considered it in the past. Are you
    proposing we also use a single version between all connectors? If we have a
    single version then it makes sense to combine them in a single repo, if
    they are separate versions, then splitting them makes sense. This was
    discussed last year more generally [1] and the consensus was "we ultimately
    propose to have a single repository per connector".

    Combining all AWS connectors into a single repo with a single version is
    inline with how the AWS SDK works, therefore AWS users are familiar with
    this approach. However it is frustrating that we would have to release all
    connectors to fix a bug or add a feature in one of them. Example: a user is
    using Kinesis Data Streams only (the most popular and mature connector),
    and we evolve the version from 1.x to 2.y (or 1.x to 1.y) for a DynamoDB
    change.

    I am torn and will think some more, but it would be great to hear other
    people's opinions.

    [1] https://lists.apache.org/thread/bywh947r2f5hfocxq598zhyh06zhksrm

    Thanks,
    Danny

    On Fri, Oct 21, 2022 at 3:11 PM Jing Ge <j...@ververica.com> wrote:

    > I agree with Jark. It would be easier for the further development and
    > maintenance, if all aws related connectors and the base module are in the
    > same repo. It might make sense to upgrade the flink-connector-dynamodb to
    > flink-connector-aws and move the other modules including the
    > flink-connector-aws-base into it. The aws sdk could be managed in
    > flink-connector-aws-base. Any future common connector features could also
    > be developed in the base module.
    >
    > Best regards,
    > Jing
    >
    > On Fri, Oct 21, 2022 at 1:26 PM Jark Wu <imj...@gmail.com> wrote:
    >
    >> How about creating a new repository flink-connector-aws and merging
    >> dynamodb, kinesis firehouse into it?
    >> This can reduce the maintenance for complex dependencies and make the
    >> release easy.
    >> I think the maintainers of aws-releated connectors are the same people.
    >>
    >> Best,
    >> Jark
    >>
    >> > 2022年10月21日 17:41,Chesnay Schepler <ches...@apache.org> 写道:
    >> >
    >> > I would not go with 2); I think it'd just be messy .
    >> >
    >> > Here's another option:
    >> >
    >> > Create another repository (aws-connector-base) (following the
    >> externalization model), add it as a sub-module to the downstream
    >> repositories, and make it part of the release process of said connector.
    >> >
    >> > I.e., we never create a release for aws-connector-bose, but release it
    >> as part of the connector.
    >> > This main benefit here is that we'd always be able to make changes to
    >> the aws-base code without delaying connector releases.
    >> > I would assume that any added overhead due to _technically_ releasing
    >> the aws code multiple times to be negligible.
    >> >
    >> >
    >> > On 20/10/2022 22:38, Danny Cranmer wrote:
    >> >> Hello all,
    >> >>
    >> >> Currently we have 2 AWS Flink connectors in the main Flink codebase
    >> >> (Kinesis Data Streams and Kinesis Data Firehose) and one new
    >> externalized
    >> >> connector in progress (DynamoDB). Currently all three of these use
    >> common
    >> >> AWS utilities from the flink-connector-aws-base module. Common code
    >> >> includes client builders, property keys, validation, utils etc.
    >> >>
    >> >> Once we externalize the connectors, leaving flink-connector-aws-base
    >> in the
    >> >> main Flink repository will restrict our ability to evolve the
    >> connectors
    >> >> quickly. For example, as part of the DynamoDB connector build we are
    >> >> considering adding a general retry strategy config that can be
    >> leveraged by
    >> >> all connectors. We would need to block on Flink 1.17 for this.
    >> >>
    >> >> In the past we have tried to keep the AWS SDK version consistent 
across
    >> >> connectors, with the externalization this is more likely to diverge.
    >> >>
    >> >> Option 1: I propose we create a new repository, flink-connector-aws,
    >> which
    >> >> we can move the flink-connector-aws-base module to and create a new
    >> >> flink-connector-aws-parent to manage SDK versions. Each of the
    >> externalized
    >> >> AWS connectors will depend on this new module and parent. Downside is
    >> an
    >> >> additional module to release per Flink version, however I will
    >> volunteer to
    >> >> manage this.
    >> >>
    >> >> Option 2: We can move the flink-connector-aws-base module and create
    >> >> flink-connector-parent within the flink-connector-shared-utils repo 
[2]
    >> >>
    >> >> Option 3: We do nothing.
    >> >>
    >> >> For option 1+2 we will follow the general externalized connector
    >> versioning
    >> >> strategy and rules.
    >> >>
    >> >> I am inclined towards option 1, and appreciate feedback from the
    >> community.
    >> >>
    >> >> [1]
    >> >>
    >> 
https://github.com/apache/flink/tree/master/flink-connectors/flink-connector-aws-base
    >> >> [2] https://github.com/apache/flink-connector-shared-utils
    >> >>
    >> >> Thanks,
    >> >> Danny
    >> >>
    >> >
    >>
    >>

Reply via email to