Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

Artsem Semianenka Thu, 18 Apr 2019 08:39:19 -0700

Sorry guys I've attached the wrong link for Jira ticket in the
previous email. This is the correct link :
https://issues.apache.org/jira/browse/FLINK-12256


On Thu, 18 Apr 2019 at 18:29, Artsem Semianenka <artfulonl...@gmail.com>
wrote:

> Thank you guys so much!
>
> You provided me a lot of helpful information.
> I've created the Jira ticket[1] and added to it an initial description
> only with the main purpose of the new feature. More detailed implementation
> description will be added further.
>
> Hi Rong, to tell the truth, my first idea was to use some predefined
> prefix/postfix for topic name and lookup mapping between
> topic/schema-subject.  But the idea with a separated view of a logical
> table with schema looks more elegant and flexible.
>
> Also, I thought about other approaches on how to define the mapping
> between topic and schema-subject in case if they have different names:
> Define the "subject" as a part of the table definition:
>
> Select * from kafka.topic.subject
> or
> Select * from kafka.topic#subject
>
> In case if the subject is not defined try to find a subject with the same
> name as a topic.
> If the subject still not found  -  take one last message and try to infer
> the schema ( retrieve schema id from the message and get last defined
> schema)
>
> But I see one disadvantage for all of these approaches: the subject name
> may contain not supported in SQL symbols.
>
> I try to investigate how to escape the illegal symbols in the table name
> definition.
>
> Thanks,
> Artsem
>
> [1] https://issues.apache.org/jira/browse/FLINK-11275
>
> On Thu, 18 Apr 2019 at 11:54, Timo Walther <twal...@apache.org> wrote:
>
>> Hi Artsem,
>>
>> having a catalog support for Confluent Schema Registry would be a great
>> addition. Although the implementation of FLIP-30 is still ongoing, we
>> merged the stable interfaces today [0]. This should unblock people from
>> contributing new catalog implementations. So you could already start
>> designing an implementation. The implementation could be unit tested for
>> now until it can also be registered in a table environment for
>> integration tests/end-to-end tests.
>>
>> I hope we can reuse the existing SQL Kafka connector and SQL Avro format?
>>
>> Looking forward to a JIRA issue and a little design document how to
>> connect the APIs.
>>
>> Thanks,
>> Timo
>>
>> [0] https://github.com/apache/flink/pull/8007
>>
>> Am 18.04.19 um 07:03 schrieb Bowen Li:
>> > Hi,
>> >
>> > Thanks Artsem and Rong for bringing up the demand from user
>> perspective. A
>> > Kafka/Confluent Schema Registry catalog would have a good use case in
>> > Flink. We actually mentioned the potential of Unified Catalog APIs for
>> > Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad
>> to
>> > learn you are interested in contributing. I think creating a JIRA ticket
>> > with link in FLINK-11275 [2], and starting with discussions and design
>> > would help to advance the effort.
>> >
>> > The most interesting part of Confluent Schema Registry, from my point of
>> > view, is the core idea of smoothing real production experience and
>> things
>> > built around it, including versioned schemas, schema evolution and
>> > compatibility checks, etc. Introducing a confluent-schema-registry
>> backed
>> > catalog to Flink may also help our design to benefit from those ideas.
>> >
>> > To add on Dawid's points. I assume the MVP for this project would be
>> > supporting Kafka as streaming tables thru the new catalog. FLIP-30 is
>> for
>> > both streaming and batch tables, thus it won't be blocked by the whole
>> > FLIP-30. I think as soon as we finish the table operation APIs, finalize
>> > properties and formats, and connect the APIs to Calcite, this work can
>> be
>> > unblocked. Timo and Xuefu may have more things to say.
>> >
>> > [1]
>> >
>> https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23
>> > [2] https://issues.apache.org/jira/browse/FLINK-11275
>> >
>> > On Wed, Apr 17, 2019 at 6:39 PM Jark Wu <imj...@gmail.com> wrote:
>> >
>> >> Hi Rong,
>> >>
>> >> Thanks for pointing out the missing FLIPs in the FLIP main page. I
>> added
>> >> all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30,
>> FLIP-31) to
>> >> the page.
>> >>
>> >> I also include @xuef...@alibaba-inc.com <xuef...@alibaba-inc.com>
>> and @Bowen
>> >> Li <bowenl...@gmail.com>  into the thread who are familiar with the
>> >> latest catalog design.
>> >>
>> >> Thanks,
>> >> Jark
>> >>
>> >> On Thu, 18 Apr 2019 at 02:39, Rong Rong <walter...@gmail.com> wrote:
>> >>
>> >>> Thanks Artsem for looking into this problem and Thanks Dawid for
>> bringing
>> >>> up the discussion on FLIP-30.
>> >>>
>> >>> We've observe similar scenarios when we also would like to reuse the
>> >>> schema
>> >>> registry of both Kafka stream as well as the raw ingested kafka
>> messages
>> >>> in
>> >>> datalake.
>> >>> FYI another more catalog-oriented document can be found here [1]. I do
>> >>> have
>> >>> one question to follow up with Dawid's point (2): are we suggesting
>> that
>> >>> different kafka topics (e.g. test-topic-prod, test-topic-non-prod,
>> etc)
>> >>> considered as a "view" of a logical table with schema (e.g.
>> test-topic) ?
>> >>>
>> >>> Also, seems like a few of the FLIPs, like the FLIP-30 page is not
>> linked
>> >>> in
>> >>> the main FLIP confluence wiki page [2] for some reason.
>> >>> I tried to fix that be seems like I don't have permission. Maybe
>> someone
>> >>> can also take a look?
>> >>>
>> >>> Thanks,
>> >>> Rong
>> >>>
>> >>>
>> >>> [1]
>> >>>
>> >>>
>> https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit#heading=h.xp424vn7ioei
>> >>> [2]
>> >>>
>> >>>
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>> >>>
>> >>> On Wed, Apr 17, 2019 at 2:30 AM Artsem Semianenka <
>> artfulonl...@gmail.com
>> >>> wrote:
>> >>>
>> >>>> Thank you, Dawid!
>> >>>> This is very helpful information. I will keep a close eye on the
>> >>> updates of
>> >>>> FLIP-30 and contribute whenever it possible.
>> >>>> I guess I may create a Jira ticket for my proposal in which I
>> describe
>> >>> the
>> >>>> idea and attach intermediate pull request based on current API(just
>> for
>> >>>> initial discuss). But the final pull request definitely will be
>> based on
>> >>>> FLIP-30 API.
>> >>>>
>> >>>> Best regards,
>> >>>> Artsem
>> >>>>
>> >>>> On Wed, 17 Apr 2019 at 09:36, Dawid Wysakowicz <
>> dwysakow...@apache.org>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Artsem,
>> >>>>>
>> >>>>> I think it totally makes sense to have a catalog for the Schema
>> >>>>> Registry. It is also good to hear you want to contribute that. There
>> >>> is
>> >>>>> few important things to consider though:
>> >>>>>
>> >>>>> 1. The Catalog interface is currently under rework. You make take a
>> >>> look
>> >>>>> at the corresponding FLIP-30[1], and also have a look at the first
>> PR
>> >>>>> that introduces the basic interfaces[2]. I think it would be worth
>> to
>> >>>>> already consider those changes. I cc Xuefu who is participating in
>> the
>> >>>>> efforts of Catalog integration.
>> >>>>>
>> >>>>> 2. There is still ongoing discussion about what properties should we
>> >>>>> store for streaming tables and how. I think this might affect (but
>> >>> maybe
>> >>>>> doesn't have to) the design of the Catalog.[3] I cc Timo who might
>> >>> give
>> >>>>> more insights if those should be blocking for the work around this
>> >>>> Catalog.
>> >>>>> Best,
>> >>>>>
>> >>>>> Dawid
>> >>>>>
>> >>>>> [1]
>> >>>>>
>> >>>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs
>> >>>>> [2] https://github.com/apache/flink/pull/8007
>> >>>>>
>> >>>>> [3]
>> >>>>>
>> >>>>>
>> >>>
>> https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.egn858cgizao
>> >>>>> On 16/04/2019 17:35, Artsem Semianenka wrote:
>> >>>>>> Hi guys!
>> >>>>>>
>> >>>>>> I'm working on External Catalog for Confluent Kafka. The main idea
>> >>> to
>> >>>>>> register the external catalog which provides the list of Kafka
>> >>> topics
>> >>>> and
>> >>>>>> execute SQL queries like :
>> >>>>>> Select * form kafka.topic_name
>> >>>>>>
>> >>>>>> I'm going to receive the table schema from Confluent schema
>> >>> registry.
>> >>>> The
>> >>>>>> main disadvantage is: we should have the topic name with the same
>> >>> name
>> >>>>>> (prefix and postfix are accepted ) as this schema subject in Schema
>> >>>>>> Registry.
>> >>>>>> For example :
>> >>>>>> topic: test-topic-prod
>> >>>>>> schema subject: test-topic
>> >>>>>>
>> >>>>>> I would like to contribute this solution into the main Flink branch
>> >>> and
>> >>>>>> would like to discuss the pros and cons of this approach.
>> >>>>>>
>> >>>>>> Best regards,
>> >>>>>> Artsem
>> >>>>>>
>> >>>>>
>> >>>> --
>> >>>>
>> >>>> С уважением,
>> >>>> Артем Семененко
>> >>>>
>>
>>
>
> --
>
> С уважением,
> Артем Семененко
>


-- 

С уважением,
Артем Семененко

Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

Reply via email to