Hi Abid and all, I added the Iceberg dev community for a wider discussion.
I agree with Yuxia and have the same concern as Steven Wu. There were long discussions around the externalizing connector and many different opinions. If I remember correctly[1][2], at last, we would like to externalize ElasticSearch as an example, and see how it works and what we can standardize (e.g., docs, releases, versions, CI). When everything works well, we can externalize other connectors. However, from what I see, currently, the externalized ElasticSearch connector is still at an early stage without releasing any versions. It looks like we still don't have a mature workflow. It's also not clear to me how much maintenance increased. Is this a scalable way to support dozens of connectors? Does the community have so many resources/committers to merge PR? How much impact on contributors' contribution when it's not in the main repo? IMO, the Iceberg connector is a very important connector for the Flink ecosystem. It's a mature connector and many users like it! I hope it can have a better future. However, the externalizing workflow is still evolving and under verification. It might not be the best place for popular connectors at the current point in time. For the reasons of moving the Iceberg connection that Abid mentioned, 1) API stability to reduce multiple version maintenance. 2) Flink experts to help maintain the connector. I think the moving doesn't help much for the API issues because it is still in a separate repo. On the contrary, the connector has to struggle with additional API issues from the Iceberg project. Besides, the connector may need to maintain 6 more versions (3x3 vs 3) which is un-maintainable. Actually, Flink API is becoming stable in recent versions. We have also verified the latest Iceberg connector on the upcoming 1.16 release, and it works well. Flink community also proposed FLIPs[3][4] for API stability guarantees. On the other side, I also don't like the version matrix modules/branches. We use a shim layer to support different versions of Hive for flink-connector-hive with only 1 module for different hive versions. We have similar practices in flink-cdc-connectors[5] and end-to-end tests to guarantee compatibility with different Flink versions[6]. The maintenance is acceptable to us for so long. In a word, I think we have ways to solve API issues and Flink API is becoming stable. For the Flink experts, Yuxia is the component owner of flink-connector-hive. He has plenty of knowledge of cross-version compatibility. He is willing to join the Iceberg community to help improve the version problem and maintain the connector. What do you think about it? Best, Jark Ververica (Alibaba) [1] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h [2] https://lists.apache.org/thread/9mzxnl4948ddq07f980mmzoz0c9stnlb [3]: https://cwiki.apache.org/confluence/display/FLINK/FLIP-196%3A+Source+API+stability+guarantees [4]: https://cwiki.apache.org/confluence/display/FLINK/FLIP-197%3A+API+stability+graduation+process [5] https://github.com/ververica/flink-cdc-connectors/ [6] https://github.com/ververica/flink-cdc-connectors/blob/master/flink-cdc-e2e-tests/src/test/java/com/ververica/cdc/connectors/tests/utils/FlinkContainerTestEnvironment.java#L124 On Thu, 20 Oct 2022 at 22:41, Jing Ge <j...@ververica.com> wrote: > I agree with Steven Wu that those points are applicable to every > externalized connector. So those were actually concerns about externalizing > connector development and there were already some discussions and consensus > has already been made to do it. > > Speaking of the 3x3 concern, I think the concept[1] proposed by Chesnay and > voted at [2] could help you. > > [1] > > https://cwiki.apache.org/confluence/display/FLINK/Externalized+Connector+development > [2] https://lists.apache.org/thread/7qr8jc053y8xpygcwbhlqq4r7c7fj1p3 > > Best regards, > Jing > > On Thu, Oct 20, 2022 at 3:46 PM Steven Wu <stevenz...@gmail.com> wrote: > > > Yuxia, those are valid points. But they are applicable to every connector > > (not just Iceberg). > > > > I also had a similar concern expressed in the discussion thread of > > "Externalized connector release details&workflow". My main concern is the > > multiplication factor of two upstream projects (Flink & storage/Iceberg). > > if we limit both to two versions, it will be 2x2, which might still be > ok. > > but if we need to do 3x3, that will probably be too many to manage. > > > > On Thu, Oct 20, 2022 at 5:27 AM yuxia <luoyu...@alumni.sjtu.edu.cn> > wrote: > > > > > Hi, abmo, Abid! > > > Thanks you guys for diriving it. > > > > > > As Iceberg is more and more pupular and is an important > > > upstream/downstream system to Flink, I believe Flink community has paid > > > much attention to Icberg and hope to be closer to Icberg community. No > > > mather it's moved to Flink unbrella or not, I believe Flink experts are > > > glad to give feedbacks to Iceberg and take part in the development of > > > Icberg Flink connector. > > > > > > > > > Personaly, as a Flink contributor and main maintainer of Hive Flink > > > connector, I'm really glad to take part in Iceberg community for the > > > maintenance and future development of Icberg Flink connector. I think I > > can > > > provide some views from Flink side and bring some feedbacks from Icberg > > > comminuty to Flink community. > > > > > > But I have some concerns for moving the connector from Icberg > repository > > > to a separate connector under Flink umbrella: > > > > > > 1: If Iceberg develops new features, for icberg flink connector, it > have > > > to wait the Iceberg to be released before starting the development and > > > release for making use of the new features. For users, they may need > to > > > wait a much longer time before enjoying the new features of Icberg by > > using > > > Flink. > > > > > > 2: If we move it to a sepreate repositoy, I'm afrad of it'll loss > > > attention from both Flink and Iceberg sides which is definitely a harm > to > > > Flink and Icerberg community. What's more, whenever Flink and icberge > > > release a version, we need to update the version in the sepreate > > > repositoy, which I think may be easily forgotten and tedious. > > > > > > Feel sorry for raising a different voice in this dicussion, but I think > > it > > > deserves a further dicussion in dev mail list, at least it will help to > > get > > > Flink developer's attention to Iceberg. > > > > > > Best regards, > > > Yuxia > > > > > > ----- 原始邮件 ----- > > > 发件人: "abmo work" <abmo.w...@icloud.com.INVALID> > > > 收件人: "dev" <d...@flink.apache.org> > > > 发送时间: 星期四, 2022年 10 月 20日 上午 6:33:40 > > > 主题: Re: [Discuss]- Donate Iceberg Flink Connector > > > > > > Hi Martijn, > > > > > > I created a FLIP for this, its FLIP 267: Iceberg Connector < > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP+267:+Iceberg+Connector > > > > > > > Please let me know if anything else is needed. My email on confluence > is > > > abmo.w...@icloud.com. > > > > > > As 1.0 was released today, from Iceberg perspective we need to figure > out > > > what versions of Flink we will support and the release timeline as to > > when > > > the connector will be built and release off of the new repo vs Iceberg. > > > > > > Thanks > > > Abid > > > > > > > On Oct 19, 2022, at 12:43 PM, Martijn Visser < > martijnvis...@apache.org > > > > > > wrote: > > > > > > > > Hi Abid, > > > > > > > > We should have a FLIP as this would be a code contribution. If you > > > provide > > > > your Confluence user name, we can grant you access to create one. > > > > > > > > Is there also something from an Iceberg point of view needed to agree > > > with > > > > the code contribution? > > > > > > > > Best regards, > > > > > > > > Martijn > > > > > > > > Op wo 19 okt. 2022 om 19:11 schreef <abmo.w...@icloud.com.invalid> > > > > > > > >> Thanks Martijn! > > > >> > > > >> Thanks for all the support and positive responses. I will start a > vote > > > >> thread and send it out to the dev list. > > > >> > > > >> Also, we need help with creation of a new repo for the Iceberg > > > Connector. > > > >> > > > >> Can someone help with the creation of a repo? Please let me know if > I > > > need > > > >> to create an issue or flip for that. > > > >> Following similar naming for other connectors, I propose > > > >> https://github.com/apache/flink-connector-iceberg (doesn’t exist) > > > >> > > > >> Thanks > > > >> Abid > > > >> > > > >> On 2022/10/19 08:41:02 Martijn Visser wrote: > > > >>> Hi all, > > > >>> > > > >>> Thanks for the info and also thanks Peter and Steven for offering > to > > > >>> volunteer. I think that's a great idea and a necessity. > > > >>> > > > >>> Overall +1 given the current ideas to make this contribution > happen. > > > >>> > > > >>> BTW congrats on reaching Iceberg 1.0, a great accomplishment :) > > > >>> > > > >>> Thanks, > > > >>> > > > >>> Martijn > > > >>> > > > >>> On Tue, Oct 18, 2022 at 12:31 AM Steven Wu <st...@gmail.com> > wrote: > > > >>> > > > >>>> I was one of the maintainers for the Flink Iceberg connector in > > > Iceberg > > > >>>> repo. I can volunteer as one of the initial maintainers if we > decide > > > to > > > >>>> move forward. > > > >>>> > > > >>>> On Mon, Oct 17, 2022 at 3:26 PM <ab...@icloud.com.invalid> wrote: > > > >>>> > > > >>>>> Hi Martijn, > > > >>>>> > > > >>>>> Yes, It is considered a connector in Flink terms. > > > >>>>> > > > >>>>> We wanted to join the Flink connector externalization effort so > > that > > > >> we > > > >>>>> can bring the Iceberg connector closer to the Flink community. We > > are > > > >>>>> hoping any issues with the APIs for Iceberg connector will > surface > > > >> sooner > > > >>>>> and get more attention from the Flink community when the > connector > > is > > > >>>>> within Flink umbrella rather than in Iceberg repo. Also to get > > better > > > >>>>> feedback from Flink experts when it comes to things related to > > adding > > > >>>>> things in a connector vs Flink itself. > > > >>>>> > > > >>>>> Thanks everyone for all your responses! Looking forward to the > next > > > >>>> steps. > > > >>>>> > > > >>>>> Thanks > > > >>>>> Abid > > > >>>>> > > > >>>>> On 2022/10/14 03:37:09 Jark Wu wrote: > > > >>>>>> Thank Abid for the discussion, > > > >>>>>> > > > >>>>>> I'm also fine with maintaining it under the Flink project. > > > >>>>>> But I'm also interested in the response to Martijn's question. > > > >>>>>> > > > >>>>>> Besides, once the code is moved to the Flink project, are there > > any > > > >>>>> initial > > > >>>>>> maintainers for the connector we can find? > > > >>>>>> In addition, do we still maintain documentation under Iceberg > > > >>>>>> https://iceberg.apache.org/docs/latest/flink/ ? > > > >>>>>> > > > >>>>>> Best, > > > >>>>>> Jark > > > >>>>>> > > > >>>>>> > > > >>>>>> On Thu, 13 Oct 2022 at 17:52, yuxia <lu...@alumni.sjtu.edu.cn> > > > >> wrote: > > > >>>>>> > > > >>>>>>> +1. Thanks for driving it. Hope I can find some chances to take > > > >> part > > > >>>> in > > > >>>>>>> the future development of Iceberg Flink Connector. > > > >>>>>>> > > > >>>>>>> Best regards, > > > >>>>>>> Yuxia > > > >>>>>>> > > > >>>>>>> ----- 原始邮件 ----- > > > >>>>>>> 发件人: "Zheng Yu Chen" <ja...@gmail.com> > > > >>>>>>> 收件人: "dev" <de...@flink.apache.org> > > > >>>>>>> 发送时间: 星期四, 2022年 10 月 13日 上午 11:26:29 > > > >>>>>>> 主题: Re: [Discuss]- Donate Iceberg Flink Connector > > > >>>>>>> > > > >>>>>>> +1, thanks to drive it > > > >>>>>>> > > > >>>>>>> Abid Mohammed <ab...@icloud.com.invalid> 于2022年10月10日周一 > 09:22写道: > > > >>>>>>> > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> I would like to start a discussion about contributing Iceberg > > > >> Flink > > > >>>>>>>> Connector to Flink. > > > >>>>>>>> > > > >>>>>>>> I created a doc < > > > >>>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>> > > > >> > > > > > > https://docs.google.com/document/d/1WC8xkPiVdwtsKL2VSPAUgzm9EjrPs8ZRjEtcwv93ISI/edit?usp=sharing > > > >>>>>>>> > > > >>>>>>>> with all the details following the Flink Connector template as > > > >> I > > > >>>>> don’t > > > >>>>>>> have > > > >>>>>>>> permissions to create a FLIP yet. > > > >>>>>>>> High level details are captured below: > > > >>>>>>>> > > > >>>>>>>> Motivation: > > > >>>>>>>> > > > >>>>>>>> This FLIP aims to contribute the existing Apache Iceberg Flink > > > >>>>> Connector > > > >>>>>>>> to Flink. > > > >>>>>>>> > > > >>>>>>>> Apache Iceberg is an open table format for huge analytic > > > >> datasets. > > > >>>>>>> Iceberg > > > >>>>>>>> adds tables to compute engines including Spark, Trino, > > > >> PrestoDB, > > > >>>>> Flink, > > > >>>>>>>> Hive and Impala using a high-performance table format that > > > >> works > > > >>>> just > > > >>>>>>> like > > > >>>>>>>> a SQL table. > > > >>>>>>>> Iceberg avoids unpleasant surprises. Schema evolution works > and > > > >>>> won’t > > > >>>>>>>> inadvertently un-delete data. Users don’t need to know about > > > >>>>> partitioning > > > >>>>>>>> to get fast queries. Iceberg was designed to solve correctness > > > >>>>> problems > > > >>>>>>> in > > > >>>>>>>> eventually-consistent cloud object stores. > > > >>>>>>>> > > > >>>>>>>> Iceberg supports both Flink’s DataStream API and Table API. > > > >> Based > > > >>>> on > > > >>>>> the > > > >>>>>>>> guideline of the Flink community, only the latest 2 minor > > > >> versions > > > >>>>> are > > > >>>>>>>> actively maintained. See the Multi-Engine Support#apache-flink > > > >> for > > > >>>>>>> further > > > >>>>>>>> details. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> Iceberg connector supports: > > > >>>>>>>> > > > >>>>>>>> • Source: detailed Source design < > > > >>>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>> > > > >> > > > > > > https://docs.google.com/document/d/1q6xaBxUPFwYsW9aXWxYUh7die6O7rDeAPFQcTAMQ0GM/edit# > > > >>>>>>>> , > > > >>>>>>>> based on FLIP-27 > > > >>>>>>>> • Sink: detailed Sink design and interfaces used < > > > >>>>>>>> > > > >>>>>>> > > > >>>>> > > > >>>> > > > >> > > > > > > https://docs.google.com/document/d/1O-dPaFct59wUWQECXEEYIkl9_MOoG3zTbC2V-fZRwrg/edit# > > > >>>>>>>>> > > > >>>>>>>> • Usable in both DataStream and Table API/SQL > > > >>>>>>>> • DataStream read/append/overwrite > > > >>>>>>>> • SQL create/alter/drop table, select, insert into, > > > >> insert > > > >>>>>>>> overwrite > > > >>>>>>>> • Streaming or batch read in Java API > > > >>>>>>>> • Support for Flink’s Python API > > > >>>>>>>> > > > >>>>>>>> See Iceberg Flink < > > > >>>>> https://iceberg.apache.org/docs/latest/flink/#flink > > > >>>>>>>> for > > > >>>>>>>> detailed usage instructions. > > > >>>>>>>> > > > >>>>>>>> Looking forward to the discussion! > > > >>>>>>>> > > > >>>>>>>> Thanks > > > >>>>>>>> Abid > > > >>>>>>> > > > >>>>>> > > > >>>> > > > >>> > > > > > > > > -- > > > > Martijn > > > > https://twitter.com/MartijnVisser82 > > > > https://github.com/MartijnVisser > > > > > >