Hi Ryan, Thanks for your input.
I think the Flink Connector API is relatively stable now, compared to the previous versions. We have verified the latest Iceberg connector with the upcoming 1.16 release, and it works well. I think API stability is something for the future and we should have some workflow or mechanism to guarantee this from an external connector side. We will come up with a proposal about the API compatibility guarantee workflow/mechanism and a best practice + PoC for multi-version support. We are willing to join the Iceberg community to improve/refactor the connector and deliver a better experience of the connector for users. How about holding the voting a bit and waiting until we have a conclusion about the discussion? Best, Jark On Tue, 25 Oct 2022 at 03:55, Ryan Blue <b...@tabular.io> wrote: > I don't think we want to talk about the Flink community accepting the > Iceberg connector just yet. The goal of Abid's exploration is to see > what it would look like as an external connector. We'd need to decide > in the Iceberg community if that's something that we'd want to do long > term. If it were me, I'd probably say wait until the connector APIs > are stable and there is a best practice for releasing. > > Ryan > > On Mon, Oct 24, 2022 at 11:16 AM Martijn Visser > <martijnvis...@apache.org> wrote: > > > > Hi all, > > > > There are many valid points raised in this discussion thread, but I > think we should not mix up different topics. From my perspective, there's > two things ongoing: > > > > 1. This thread is about the Flink community accepting the Iceberg > connector, with various maintainers from Iceberg volunteering to help with > the maintenance of the connector itself. > > 2. Also included in this thread are discussions about the > externalization of connectors from Flink. There have been recent > discussions on this [1] and there is engineering activity happening on that > topic and it is a big focus point for the next couple weeks/months. With > regards to seeing different opinions, I actually don't see those on the > mailing list because after the discussions, voting is passing. > > > > Best regards, > > > > Martijn > > > > [1] > https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development > > > > On Fri, Oct 21, 2022 at 3:01 AM Jark Wu <imj...@gmail.com> wrote: > >> > >> Hi Abid and all, > >> > >> I added the Iceberg dev community for a wider discussion. > >> > >> I agree with Yuxia and have the same concern as Steven Wu. > >> > >> There were long discussions around the externalizing connector and many > >> different opinions. > >> If I remember correctly[1][2], at last, we would like to externalize > >> ElasticSearch as an example, > >> and see how it works and what we can standardize (e.g., docs, releases, > >> versions, CI). > >> When everything works well, we can externalize other connectors. > >> > >> However, from what I see, currently, the externalized ElasticSearch > >> connector > >> is still at an early stage without releasing any versions. > >> It looks like we still don't have a mature workflow. > >> It's also not clear to me how much maintenance increased. > >> Is this a scalable way to support dozens of connectors? > >> Does the community have so many resources/committers to merge PR? > >> How much impact on contributors' contribution when it's not in the main > >> repo? > >> > >> IMO, the Iceberg connector is a very important connector for the Flink > >> ecosystem. > >> It's a mature connector and many users like it! I hope it can have a > better > >> future. > >> However, the externalizing workflow is still evolving and under > >> verification. > >> It might not be the best place for popular connectors at the current > point > >> in time. > >> > >> For the reasons of moving the Iceberg connection that Abid mentioned, > >> 1) API stability to reduce multiple version maintenance. > >> 2) Flink experts to help maintain the connector. > >> > >> I think the moving doesn't help much for the API issues because it is > still > >> in a separate repo. > >> On the contrary, the connector has to struggle with additional API > issues > >> from the Iceberg project. > >> Besides, the connector may need to maintain 6 more versions (3x3 vs 3) > >> which is un-maintainable. > >> Actually, Flink API is becoming stable in recent versions. We have also > >> verified the latest Iceberg > >> connector on the upcoming 1.16 release, and it works well. Flink > community > >> also proposed FLIPs[3][4] > >> for API stability guarantees. On the other side, I also don't like the > >> version matrix modules/branches. > >> We use a shim layer to support different versions of Hive for > >> flink-connector-hive with only 1 module > >> for different hive versions. We have similar practices in > >> flink-cdc-connectors[5] and end-to-end tests > >> to guarantee compatibility with different Flink versions[6]. The > >> maintenance is acceptable to us for so long. > >> > >> In a word, I think we have ways to solve API issues and Flink API is > >> becoming stable. > >> For the Flink experts, Yuxia is the component owner of > >> flink-connector-hive. He has plenty > >> of knowledge of cross-version compatibility. He is willing to join the > >> Iceberg community to > >> help improve the version problem and maintain the connector. What do you > >> think about it? > >> > >> Best, > >> Jark > >> Ververica (Alibaba) > >> > >> [1] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h > >> [2] https://lists.apache.org/thread/9mzxnl4948ddq07f980mmzoz0c9stnlb > >> [3]: > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees > >> [4]: > >> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process > >> [5] https://github.com/ververica/flink-cdc-connectors/ > >> [6] > >> > https://github.com/ververica/flink-cdc-connectors/blob/master/flink-cdc-e2e-tests/src/test/java/com/ververica/cdc/connectors/tests/utils/FlinkContainerTestEnvironment.java#L124 > >> > >> On Thu, 20 Oct 2022 at 22:41, Jing Ge <j...@ververica.com> wrote: > >> > >> > I agree with Steven Wu that those points are applicable to every > >> > externalized connector. So those were actually concerns about > externalizing > >> > connector development and there were already some discussions and > consensus > >> > has already been made to do it. > >> > > >> > Speaking of the 3x3 concern, I think the concept[1] proposed by > Chesnay and > >> > voted at [2] could help you. > >> > > >> > [1] > >> > > >> > > https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development > >> > [2] https://lists.apache.org/thread/7qr8jc053y8xpygcwbhlqq4r7c7fj1p3 > >> > > >> > Best regards, > >> > Jing > >> > > >> > On Thu, Oct 20, 2022 at 3:46 PM Steven Wu <stevenz...@gmail.com> > wrote: > >> > > >> > > Yuxia, those are valid points. But they are applicable to every > connector > >> > > (not just Iceberg). > >> > > > >> > > I also had a similar concern expressed in the discussion thread of > >> > > "Externalized connector release details&workflow". My main concern > is the > >> > > multiplication factor of two upstream projects (Flink & > storage/Iceberg). > >> > > if we limit both to two versions, it will be 2x2, which might still > be > >> > ok. > >> > > but if we need to do 3x3, that will probably be too many to manage. > >> > > > >> > > On Thu, Oct 20, 2022 at 5:27 AM yuxia <luoyu...@alumni.sjtu.edu.cn> > >> > wrote: > >> > > > >> > > > Hi, abmo, Abid! > >> > > > Thanks you guys for diriving it. > >> > > > > >> > > > As Iceberg is more and more pupular and is an important > >> > > > upstream/downstream system to Flink, I believe Flink community > has paid > >> > > > much attention to Icberg and hope to be closer to Icberg > community. No > >> > > > mather it's moved to Flink unbrella or not, I believe Flink > experts are > >> > > > glad to give feedbacks to Iceberg and take part in the > development of > >> > > > Icberg Flink connector. > >> > > > > >> > > > > >> > > > Personaly, as a Flink contributor and main maintainer of Hive > Flink > >> > > > connector, I'm really glad to take part in Iceberg community for > the > >> > > > maintenance and future development of Icberg Flink connector. I > think I > >> > > can > >> > > > provide some views from Flink side and bring some feedbacks from > Icberg > >> > > > comminuty to Flink community. > >> > > > > >> > > > But I have some concerns for moving the connector from Icberg > >> > repository > >> > > > to a separate connector under Flink umbrella: > >> > > > > >> > > > 1: If Iceberg develops new features, for icberg flink connector, > it > >> > have > >> > > > to wait the Iceberg to be released before starting the > development and > >> > > > release for making use of the new features. For users, they may > need > >> > to > >> > > > wait a much longer time before enjoying the new features of > Icberg by > >> > > using > >> > > > Flink. > >> > > > > >> > > > 2: If we move it to a sepreate repositoy, I'm afrad of it'll loss > >> > > > attention from both Flink and Iceberg sides which is definitely a > harm > >> > to > >> > > > Flink and Icerberg community. What's more, whenever Flink and > icberge > >> > > > release a version, we need to update the version in the sepreate > >> > > > repositoy, which I think may be easily forgotten and tedious. > >> > > > > >> > > > Feel sorry for raising a different voice in this dicussion, but I > think > >> > > it > >> > > > deserves a further dicussion in dev mail list, at least it will > help to > >> > > get > >> > > > Flink developer's attention to Iceberg. > >> > > > > >> > > > Best regards, > >> > > > Yuxia > >> > > > > >> > > > ----- 原始邮件 ----- > >> > > > 发件人: "abmo work" <abmo.w...@icloud.com.INVALID> > >> > > > 收件人: "dev" <dev@flink.apache.org> > >> > > > 发送时间: 星期四, 2022年 10 月 20日 上午 6:33:40 > >> > > > 主题: Re: [Discuss]- Donate Iceberg Flink Connector > >> > > > > >> > > > Hi Martijn, > >> > > > > >> > > > I created a FLIP for this, its FLIP 267: Iceberg Connector < > >> > > > > >> > > > >> > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267:+Iceberg+Connector > >> > > > > > >> > > > Please let me know if anything else is needed. My email on > confluence > >> > is > >> > > > abmo.w...@icloud.com. > >> > > > > >> > > > As 1.0 was released today, from Iceberg perspective we need to > figure > >> > out > >> > > > what versions of Flink we will support and the release timeline > as to > >> > > when > >> > > > the connector will be built and release off of the new repo vs > Iceberg. > >> > > > > >> > > > Thanks > >> > > > Abid > >> > > > > >> > > > > On Oct 19, 2022, at 12:43 PM, Martijn Visser < > >> > martijnvis...@apache.org > >> > > > > >> > > > wrote: > >> > > > > > >> > > > > Hi Abid, > >> > > > > > >> > > > > We should have a FLIP as this would be a code contribution. If > you > >> > > > provide > >> > > > > your Confluence user name, we can grant you access to create > one. > >> > > > > > >> > > > > Is there also something from an Iceberg point of view needed to > agree > >> > > > with > >> > > > > the code contribution? > >> > > > > > >> > > > > Best regards, > >> > > > > > >> > > > > Martijn > >> > > > > > >> > > > > Op wo 19 okt. 2022 om 19:11 schreef > <abmo.w...@icloud.com.invalid> > >> > > > > > >> > > > >> Thanks Martijn! > >> > > > >> > >> > > > >> Thanks for all the support and positive responses. I will > start a > >> > vote > >> > > > >> thread and send it out to the dev list. > >> > > > >> > >> > > > >> Also, we need help with creation of a new repo for the Iceberg > >> > > > Connector. > >> > > > >> > >> > > > >> Can someone help with the creation of a repo? Please let me > know if > >> > I > >> > > > need > >> > > > >> to create an issue or flip for that. > >> > > > >> Following similar naming for other connectors, I propose > >> > > > >> https://github.com/apache/flink-connector-iceberg (doesn’t > exist) > >> > > > >> > >> > > > >> Thanks > >> > > > >> Abid > >> > > > >> > >> > > > >> On 2022/10/19 08:41:02 Martijn Visser wrote: > >> > > > >>> Hi all, > >> > > > >>> > >> > > > >>> Thanks for the info and also thanks Peter and Steven for > offering > >> > to > >> > > > >>> volunteer. I think that's a great idea and a necessity. > >> > > > >>> > >> > > > >>> Overall +1 given the current ideas to make this contribution > >> > happen. > >> > > > >>> > >> > > > >>> BTW congrats on reaching Iceberg 1.0, a great accomplishment > :) > >> > > > >>> > >> > > > >>> Thanks, > >> > > > >>> > >> > > > >>> Martijn > >> > > > >>> > >> > > > >>> On Tue, Oct 18, 2022 at 12:31 AM Steven Wu <st...@gmail.com> > >> > wrote: > >> > > > >>> > >> > > > >>>> I was one of the maintainers for the Flink Iceberg connector > in > >> > > > Iceberg > >> > > > >>>> repo. I can volunteer as one of the initial maintainers if we > >> > decide > >> > > > to > >> > > > >>>> move forward. > >> > > > >>>> > >> > > > >>>> On Mon, Oct 17, 2022 at 3:26 PM <ab...@icloud.com.invalid> > wrote: > >> > > > >>>> > >> > > > >>>>> Hi Martijn, > >> > > > >>>>> > >> > > > >>>>> Yes, It is considered a connector in Flink terms. > >> > > > >>>>> > >> > > > >>>>> We wanted to join the Flink connector externalization > effort so > >> > > that > >> > > > >> we > >> > > > >>>>> can bring the Iceberg connector closer to the Flink > community. We > >> > > are > >> > > > >>>>> hoping any issues with the APIs for Iceberg connector will > >> > surface > >> > > > >> sooner > >> > > > >>>>> and get more attention from the Flink community when the > >> > connector > >> > > is > >> > > > >>>>> within Flink umbrella rather than in Iceberg repo. Also to > get > >> > > better > >> > > > >>>>> feedback from Flink experts when it comes to things related > to > >> > > adding > >> > > > >>>>> things in a connector vs Flink itself. > >> > > > >>>>> > >> > > > >>>>> Thanks everyone for all your responses! Looking forward to > the > >> > next > >> > > > >>>> steps. > >> > > > >>>>> > >> > > > >>>>> Thanks > >> > > > >>>>> Abid > >> > > > >>>>> > >> > > > >>>>> On 2022/10/14 03:37:09 Jark Wu wrote: > >> > > > >>>>>> Thank Abid for the discussion, > >> > > > >>>>>> > >> > > > >>>>>> I'm also fine with maintaining it under the Flink project. > >> > > > >>>>>> But I'm also interested in the response to Martijn's > question. > >> > > > >>>>>> > >> > > > >>>>>> Besides, once the code is moved to the Flink project, are > there > >> > > any > >> > > > >>>>> initial > >> > > > >>>>>> maintainers for the connector we can find? > >> > > > >>>>>> In addition, do we still maintain documentation under > Iceberg > >> > > > >>>>>> https://iceberg.apache.org/docs/latest/flink/ ? > >> > > > >>>>>> > >> > > > >>>>>> Best, > >> > > > >>>>>> Jark > >> > > > >>>>>> > >> > > > >>>>>> > >> > > > >>>>>> On Thu, 13 Oct 2022 at 17:52, yuxia < > lu...@alumni.sjtu.edu.cn> > >> > > > >> wrote: > >> > > > >>>>>> > >> > > > >>>>>>> +1. Thanks for driving it. Hope I can find some chances > to take > >> > > > >> part > >> > > > >>>> in > >> > > > >>>>>>> the future development of Iceberg Flink Connector. > >> > > > >>>>>>> > >> > > > >>>>>>> Best regards, > >> > > > >>>>>>> Yuxia > >> > > > >>>>>>> > >> > > > >>>>>>> ----- 原始邮件 ----- > >> > > > >>>>>>> 发件人: "Zheng Yu Chen" <ja...@gmail.com> > >> > > > >>>>>>> 收件人: "dev" <de...@flink.apache.org> > >> > > > >>>>>>> 发送时间: 星期四, 2022年 10 月 13日 上午 11:26:29 > >> > > > >>>>>>> 主题: Re: [Discuss]- Donate Iceberg Flink Connector > >> > > > >>>>>>> > >> > > > >>>>>>> +1, thanks to drive it > >> > > > >>>>>>> > >> > > > >>>>>>> Abid Mohammed <ab...@icloud.com.invalid> 于2022年10月10日周一 > >> > 09:22写道: > >> > > > >>>>>>> > >> > > > >>>>>>>> Hi, > >> > > > >>>>>>>> > >> > > > >>>>>>>> I would like to start a discussion about contributing > Iceberg > >> > > > >> Flink > >> > > > >>>>>>>> Connector to Flink. > >> > > > >>>>>>>> > >> > > > >>>>>>>> I created a doc < > >> > > > >>>>>>>> > >> > > > >>>>>>> > >> > > > >>>>> > >> > > > >>>> > >> > > > >> > >> > > > > >> > > > >> > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > >> > > > >>>>>>>> > >> > > > >>>>>>>> with all the details following the Flink Connector > template as > >> > > > >> I > >> > > > >>>>> don’t > >> > > > >>>>>>> have > >> > > > >>>>>>>> permissions to create a FLIP yet. > >> > > > >>>>>>>> High level details are captured below: > >> > > > >>>>>>>> > >> > > > >>>>>>>> Motivation: > >> > > > >>>>>>>> > >> > > > >>>>>>>> This FLIP aims to contribute the existing Apache Iceberg > Flink > >> > > > >>>>> Connector > >> > > > >>>>>>>> to Flink. > >> > > > >>>>>>>> > >> > > > >>>>>>>> Apache Iceberg is an open table format for huge analytic > >> > > > >> datasets. > >> > > > >>>>>>> Iceberg > >> > > > >>>>>>>> adds tables to compute engines including Spark, Trino, > >> > > > >> PrestoDB, > >> > > > >>>>> Flink, > >> > > > >>>>>>>> Hive and Impala using a high-performance table format > that > >> > > > >> works > >> > > > >>>> just > >> > > > >>>>>>> like > >> > > > >>>>>>>> a SQL table. > >> > > > >>>>>>>> Iceberg avoids unpleasant surprises. Schema evolution > works > >> > and > >> > > > >>>> won’t > >> > > > >>>>>>>> inadvertently un-delete data. Users don’t need to know > about > >> > > > >>>>> partitioning > >> > > > >>>>>>>> to get fast queries. Iceberg was designed to solve > correctness > >> > > > >>>>> problems > >> > > > >>>>>>> in > >> > > > >>>>>>>> eventually-consistent cloud object stores. > >> > > > >>>>>>>> > >> > > > >>>>>>>> Iceberg supports both Flink’s DataStream API and Table > API. > >> > > > >> Based > >> > > > >>>> on > >> > > > >>>>> the > >> > > > >>>>>>>> guideline of the Flink community, only the latest 2 minor > >> > > > >> versions > >> > > > >>>>> are > >> > > > >>>>>>>> actively maintained. See the Multi-Engine > Support#apache-flink > >> > > > >> for > >> > > > >>>>>>> further > >> > > > >>>>>>>> details. > >> > > > >>>>>>>> > >> > > > >>>>>>>> > >> > > > >>>>>>>> Iceberg connector supports: > >> > > > >>>>>>>> > >> > > > >>>>>>>> • Source: detailed Source design < > >> > > > >>>>>>>> > >> > > > >>>>>>> > >> > > > >>>>> > >> > > > >>>> > >> > > > >> > >> > > > > >> > > > >> > > https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit# > >> > > > >>>>>>>> , > >> > > > >>>>>>>> based on FLIP-27 > >> > > > >>>>>>>> • Sink: detailed Sink design and interfaces used < > >> > > > >>>>>>>> > >> > > > >>>>>>> > >> > > > >>>>> > >> > > > >>>> > >> > > > >> > >> > > > > >> > > > >> > > https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit# > >> > > > >>>>>>>>> > >> > > > >>>>>>>> • Usable in both DataStream and Table API/SQL > >> > > > >>>>>>>> • DataStream read/append/overwrite > >> > > > >>>>>>>> • SQL create/alter/drop table, select, insert > into, > >> > > > >> insert > >> > > > >>>>>>>> overwrite > >> > > > >>>>>>>> • Streaming or batch read in Java API > >> > > > >>>>>>>> • Support for Flink’s Python API > >> > > > >>>>>>>> > >> > > > >>>>>>>> See Iceberg Flink < > >> > > > >>>>> https://iceberg.apache.org/docs/latest/flink/#flink > >> > > > >>>>>>>> for > >> > > > >>>>>>>> detailed usage instructions. > >> > > > >>>>>>>> > >> > > > >>>>>>>> Looking forward to the discussion! > >> > > > >>>>>>>> > >> > > > >>>>>>>> Thanks > >> > > > >>>>>>>> Abid > >> > > > >>>>>>> > >> > > > >>>>>> > >> > > > >>>> > >> > > > >>> > >> > > > > > >> > > > > -- > >> > > > > Martijn > >> > > > > https://twitter.com/MartijnVisser82 > >> > > > > https://github.com/MartijnVisser > >> > > > > >> > > > >> > > > > > -- > Ryan Blue > Tabular >