Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

Timo Walther Thu, 18 Apr 2019 01:54:30 -0700

Hi Artsem,

having a catalog support for Confluent Schema Registry would be a greataddition. Although the implementation of FLIP-30 is still ongoing, wemerged the stable interfaces today [0]. This should unblock people fromcontributing new catalog implementations. So you could already startdesigning an implementation. The implementation could be unit tested fornow until it can also be registered in a table environment forintegration tests/end-to-end tests.


I hope we can reuse the existing SQL Kafka connector and SQL Avro format?

Looking forward to a JIRA issue and a little design document how toconnect the APIs.


Thanks,
Timo

[0] https://github.com/apache/flink/pull/8007

Am 18.04.19 um 07:03 schrieb Bowen Li:

Hi,

Thanks Artsem and Rong for bringing up the demand from user perspective. A
Kafka/Confluent Schema Registry catalog would have a good use case in
Flink. We actually mentioned the potential of Unified Catalog APIs for
Kafka in our talk a couple weeks ago at Flink Forward SF [1], and glad to
learn you are interested in contributing. I think creating a JIRA ticket
with link in FLINK-11275 [2], and starting with discussions and design
would help to advance the effort.

The most interesting part of Confluent Schema Registry, from my point of
view, is the core idea of smoothing real production experience and things
built around it, including versioned schemas, schema evolution and
compatibility checks, etc. Introducing a confluent-schema-registry backed
catalog to Flink may also help our design to benefit from those ideas.

To add on Dawid's points. I assume the MVP for this project would be
supporting Kafka as streaming tables thru the new catalog. FLIP-30 is for
both streaming and batch tables, thus it won't be blocked by the whole
FLIP-30. I think as soon as we finish the table operation APIs, finalize
properties and formats, and connect the APIs to Calcite, this work can be
unblocked. Timo and Xuefu may have more things to say.

[1]
https://www.slideshare.net/BowenLi9/integrating-flink-with-hive-flink-forward-sf-2019/23
[2] https://issues.apache.org/jira/browse/FLINK-11275

On Wed, Apr 17, 2019 at 6:39 PM Jark Wu <[email protected]> wrote:

Hi Rong,

Thanks for pointing out the missing FLIPs in the FLIP main page. I added
all the missing FLIP (incl. FLIP-14, FLIP-22, FLIP-29, FLIP-30, FLIP-31) to
the page.

I also include @[email protected] <[email protected]>  and @Bowen
Li <[email protected]>  into the thread who are familiar with the
latest catalog design.

Thanks,
Jark

On Thu, 18 Apr 2019 at 02:39, Rong Rong <[email protected]> wrote:

Thanks Artsem for looking into this problem and Thanks Dawid for bringing
up the discussion on FLIP-30.

We've observe similar scenarios when we also would like to reuse the
schema
registry of both Kafka stream as well as the raw ingested kafka messages
in
datalake.
FYI another more catalog-oriented document can be found here [1]. I do
have
one question to follow up with Dawid's point (2): are we suggesting that
different kafka topics (e.g. test-topic-prod, test-topic-non-prod, etc)
considered as a "view" of a logical table with schema (e.g. test-topic) ?

Also, seems like a few of the FLIPs, like the FLIP-30 page is not linked
in
the main FLIP confluence wiki page [2] for some reason.
I tried to fix that be seems like I don't have permission. Maybe someone
can also take a look?

Thanks,
Rong


[1]

https://docs.google.com/document/d/1Y9it78yaUvbv4g572ZK_lZnZaAGjqwM_EhjdOv4yJtw/edit#heading=h.xp424vn7ioei
[2]

https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals

On Wed, Apr 17, 2019 at 2:30 AM Artsem Semianenka <[email protected]
wrote:

Thank you, Dawid!
This is very helpful information. I will keep a close eye on the

updates of

FLIP-30 and contribute whenever it possible.
I guess I may create a Jira ticket for my proposal in which I describe

the

idea and attach intermediate pull request based on current API(just for
initial discuss). But the final pull request definitely will be based on
FLIP-30 API.

Best regards,
Artsem

On Wed, 17 Apr 2019 at 09:36, Dawid Wysakowicz <[email protected]>
wrote:

Hi Artsem,

I think it totally makes sense to have a catalog for the Schema
Registry. It is also good to hear you want to contribute that. There

is

few important things to consider though:

1. The Catalog interface is currently under rework. You make take a

look

at the corresponding FLIP-30[1], and also have a look at the first PR
that introduces the basic interfaces[2]. I think it would be worth to
already consider those changes. I cc Xuefu who is participating in the
efforts of Catalog integration.

2. There is still ongoing discussion about what properties should we
store for streaming tables and how. I think this might affect (but

maybe

doesn't have to) the design of the Catalog.[3] I cc Timo who might

give

more insights if those should be blocking for the work around this

Catalog.

Best,

Dawid

[1]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-30%3A+Unified+Catalog+APIs

[2] https://github.com/apache/flink/pull/8007

[3]

https://docs.google.com/document/d/1Yaxp1UJUFW-peGLt8EIidwKIZEWrrA-pznWLuvaH39Y/edit#heading=h.egn858cgizao

On 16/04/2019 17:35, Artsem Semianenka wrote:

Hi guys!

I'm working on External Catalog for Confluent Kafka. The main idea

to

register the external catalog which provides the list of Kafka

topics

and

execute SQL queries like :
Select * form kafka.topic_name

I'm going to receive the table schema from Confluent schema

registry.

The

main disadvantage is: we should have the topic name with the same

name

(prefix and postfix are accepted ) as this schema subject in Schema
Registry.
For example :
topic: test-topic-prod
schema subject: test-topic

I would like to contribute this solution into the main Flink branch

and

would like to discuss the pros and cons of this approach.

Best regards,
Artsem

--

С уважением,
Артем Семененко

Re: [DISCUSS] [FLINK SQL] External catalog for Confluent Kafka

Reply via email to