Sorry guys I've attached the wrong link for Jira ticket in the previous email. This is the correct link : https://issues.apache.org/jira/browse/FLINK-12256
On Thu, 18 Apr 2019 at 18:29, Artsem Semianenka <artfulonl...@gmail.com> wrote: > Thank you guys so much! > > You provided me a lot of helpful information. > I've created the Jira ticket[1] and added to it an initial description > only with the main purpose of the new feature. More detailed implementation > description will be added further. > > Hi Rong, to tell the truth, my first idea was to use some predefined > prefix/postfix for topic name and lookup mapping between > topic/schema-subject. But the idea with a separated view of a logical > table with schema looks more elegant and flexible. > > Also, I thought about other approaches on how to define the mapping > between topic and schema-subject in case if they have different names: > Define the "subject" as a part of the table definition: > > Select * from kafka.topic.subject > or > Select * from kafka.topic#subject > > In case if the subject is not defined try to find a subject with the same > name as a topic. > If the subject still not found - take one last message and try to infer > the schema ( retrieve schema id from the message and get last defined > schema) > > But I see one disadvantage for all of these approaches: the subject name > may contain not supported in SQL symbols. > > I try to investigate how to escape the illegal symbols in the table name > definition. > > Thanks, > Artsem > > [1] https://issues.apache.org/jira/browse/FLINK-11275 > > On Thu, 18 Apr 2019 at 11:54, Timo Walther <twal...@apache.org> wrote: > >> Hi Artsem, >> >> having a catalog support for Confluent Schema Registry would be a great >> addition. Although the implementation of FLIP-30 is still ongoing, we >> merged the stable interfaces today [0]. This should unblock people from >> contributing new catalog implementations. So you could already start >> designing an implementation. The implementation could be unit tested for >> now until it can also be registered in a table environment for >> integration tests/end-to-end tests. >> >> I hope we can reuse the existing SQL Kafka connector and SQL Avro format? >> >> Looking forward to a JIRA issue and a little design document how to >> connect the APIs. >> >> Thanks, >> Timo >> >> [0] https://github.com/apache/flink/pull/8007 >> >> Am 18.04.19 um 07:03 schrieb Bowen Li: >> > Hi, >> > >> > Thanks Artsem and Rong for bringing up the demand from user >> perspective. A >> > Kafka/Confluent Schema Registry catalog would have a good use case in >> > Flink. We actually mentioned the potential of Unified Catalog APIs for >> > Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad >> to >> > learn you are interested in contributing. I think creating a JIRA ticket >> > with link in FLINK-11275 [2], and starting with discussions and design >> > would help to advance the effort. >> > >> > The most interesting part of Confluent Schema Registry, from my point of >> > view, is the core idea of smoothing real production experience and >> things >> > built around it, including versioned schemas, schema evolution and >> > compatibility checks, etc. Introducing a confluent-schema-registry >> backed >> > catalog to Flink may also help our design to benefit from those ideas. >> > >> > To add on Dawid's points. I assume the MVP for this project would be >> > supporting Kafka as streaming tables thru the new catalog. FLIP-30 is >> for >> > both streaming and batch tables, thus it won't be blocked by the whole >> > FLIP-30. I think as soon as we finish the table operation APIs, finalize >> > properties and formats, and connect the APIs to Calcite, this work can >> be >> > unblocked. Timo and Xuefu may have more things to say. >> > >> > [1] >> > >> https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23 >> > [2] https://issues.apache.org/jira/browse/FLINK-11275 >> > >> > On Wed, Apr 17, 2019 at 6:39 PM Jark Wu <imj...@gmail.com> wrote: >> > >> >> Hi Rong, >> >> >> >> Thanks for pointing out the missing FLIPs in the FLIP main page. I >> added >> >> all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30, >> FLIP-31) to >> >> the page. >> >> >> >> I also include @xuef...@alibaba-inc.com <xuef...@alibaba-inc.com> >> and @Bowen >> >> Li <bowenl...@gmail.com> into the thread who are familiar with the >> >> latest catalog design. >> >> >> >> Thanks, >> >> Jark >> >> >> >> On Thu, 18 Apr 2019 at 02:39, Rong Rong <walter...@gmail.com> wrote: >> >> >> >>> Thanks Artsem for looking into this problem and Thanks Dawid for >> bringing >> >>> up the discussion on FLIP-30. >> >>> >> >>> We've observe similar scenarios when we also would like to reuse the >> >>> schema >> >>> registry of both Kafka stream as well as the raw ingested kafka >> messages >> >>> in >> >>> datalake. >> >>> FYI another more catalog-oriented document can be found here [1]. I do >> >>> have >> >>> one question to follow up with Dawid's point (2): are we suggesting >> that >> >>> different kafka topics (e.g. test-topic-prod, test-topic-non-prod, >> etc) >> >>> considered as a "view" of a logical table with schema (e.g. >> test-topic) ? >> >>> >> >>> Also, seems like a few of the FLIPs, like the FLIP-30 page is not >> linked >> >>> in >> >>> the main FLIP confluence wiki page [2] for some reason. >> >>> I tried to fix that be seems like I don't have permission. Maybe >> someone >> >>> can also take a look? >> >>> >> >>> Thanks, >> >>> Rong >> >>> >> >>> >> >>> [1] >> >>> >> >>> >> https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit#heading=h.xp424vn7ioei >> >>> [2] >> >>> >> >>> >> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals >> >>> >> >>> On Wed, Apr 17, 2019 at 2:30 AM Artsem Semianenka < >> artfulonl...@gmail.com >> >>> wrote: >> >>> >> >>>> Thank you, Dawid! >> >>>> This is very helpful information. I will keep a close eye on the >> >>> updates of >> >>>> FLIP-30 and contribute whenever it possible. >> >>>> I guess I may create a Jira ticket for my proposal in which I >> describe >> >>> the >> >>>> idea and attach intermediate pull request based on current API(just >> for >> >>>> initial discuss). But the final pull request definitely will be >> based on >> >>>> FLIP-30 API. >> >>>> >> >>>> Best regards, >> >>>> Artsem >> >>>> >> >>>> On Wed, 17 Apr 2019 at 09:36, Dawid Wysakowicz < >> dwysakow...@apache.org> >> >>>> wrote: >> >>>> >> >>>>> Hi Artsem, >> >>>>> >> >>>>> I think it totally makes sense to have a catalog for the Schema >> >>>>> Registry. It is also good to hear you want to contribute that. There >> >>> is >> >>>>> few important things to consider though: >> >>>>> >> >>>>> 1. The Catalog interface is currently under rework. You make take a >> >>> look >> >>>>> at the corresponding FLIP-30[1], and also have a look at the first >> PR >> >>>>> that introduces the basic interfaces[2]. I think it would be worth >> to >> >>>>> already consider those changes. I cc Xuefu who is participating in >> the >> >>>>> efforts of Catalog integration. >> >>>>> >> >>>>> 2. There is still ongoing discussion about what properties should we >> >>>>> store for streaming tables and how. I think this might affect (but >> >>> maybe >> >>>>> doesn't have to) the design of the Catalog.[3] I cc Timo who might >> >>> give >> >>>>> more insights if those should be blocking for the work around this >> >>>> Catalog. >> >>>>> Best, >> >>>>> >> >>>>> Dawid >> >>>>> >> >>>>> [1] >> >>>>> >> >>>>> >> >>> >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs >> >>>>> [2] https://github.com/apache/flink/pull/8007 >> >>>>> >> >>>>> [3] >> >>>>> >> >>>>> >> >>> >> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.egn858cgizao >> >>>>> On 16/04/2019 17:35, Artsem Semianenka wrote: >> >>>>>> Hi guys! >> >>>>>> >> >>>>>> I'm working on External Catalog for Confluent Kafka. The main idea >> >>> to >> >>>>>> register the external catalog which provides the list of Kafka >> >>> topics >> >>>> and >> >>>>>> execute SQL queries like : >> >>>>>> Select * form kafka.topic_name >> >>>>>> >> >>>>>> I'm going to receive the table schema from Confluent schema >> >>> registry. >> >>>> The >> >>>>>> main disadvantage is: we should have the topic name with the same >> >>> name >> >>>>>> (prefix and postfix are accepted ) as this schema subject in Schema >> >>>>>> Registry. >> >>>>>> For example : >> >>>>>> topic: test-topic-prod >> >>>>>> schema subject: test-topic >> >>>>>> >> >>>>>> I would like to contribute this solution into the main Flink branch >> >>> and >> >>>>>> would like to discuss the pros and cons of this approach. >> >>>>>> >> >>>>>> Best regards, >> >>>>>> Artsem >> >>>>>> >> >>>>> >> >>>> -- >> >>>> >> >>>> С уважением, >> >>>> Артем Семененко >> >>>> >> >> > > -- > > С уважением, > Артем Семененко > -- С уважением, Артем Семененко