Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

Jark Wu Mon, 19 Oct 2020 06:27:04 -0700

Hi Danny,

First of all, we didn't introduce any concepts from KSQL (e.g. Stream vs
Table notion).
This new connector will produce a changelog stream, so it's still a dynamic
table and doesn't conflict with Flink core concepts.


The "ktable" is just a connector name, we can also call it
"compacted-kafka" or something else.
Calling it "ktable" is just because KSQL users can migrate to Flink SQL
easily.

Regarding to why introducing a new connector vs a new property in existing
kafka connector:

I think the main reason is that we want to have a clear separation for such
two use cases, because they are very different.
We also listed reasons in the FLIP, including:

1) It's hard to explain what's the behavior when users specify the start
offset from a middle position (e.g. how to process non exist delete
events).
    It's dangerous if users do that. So we don't provide the offset option
in the new connector at the moment.
2) It's a different perspective/abstraction on the same kafka topic (append
vs. upsert). It would be easier to understand if we can separate them
    instead of mixing them in one connector. The new connector requires
hash sink partitioner, primary key declared, regular format.
    If we mix them in one connector, it might be confusing how to use the
options correctly.
3) The semantic of the KTable connector is just the same as KTable in Kafka
Stream. So it's very handy for Kafka Stream and KSQL users.
    We have seen several questions in the mailing list asking how to model
a KTable and how to join a KTable in Flink SQL.

Best,
Jark

On Mon, 19 Oct 2020 at 19:53, Jark Wu <[email protected]> wrote:

> Hi Jingsong,
>
> As the FLIP describes, "KTable connector produces a changelog stream,
> where each data record represents an update or delete event.".
> Therefore, a ktable source is an unbounded stream source. Selecting a
> ktable source is similar to selecting a kafka source with debezium-json
> format
> that it never ends and the results are continuously updated.
>
> It's possible to have a bounded ktable source in the future, for example,
> add an option 'bounded=true' or 'end-offset=xxx'.
> In this way, the ktable will produce a bounded changelog stream.
> So I think this can be a compatible feature in the future.
>
> I don't think we should associate with ksql related concepts. Actually, we
> didn't introduce any concepts from KSQL (e.g. Stream vs Table notion).
> The "ktable" is just a connector name, we can also call it
> "compacted-kafka" or something else.
> Calling it "ktable" is just because KSQL users can migrate to Flink SQL
> easily.
>
> Regarding the "value.fields-include", this is an option introduced in
> FLIP-107 for Kafka connector.
> I think we should keep the same behavior with the Kafka connector. I'm not
> sure what's the default behavior of KSQL.
> But I guess it also stores the keys in value from this example docs (see
> the "users_original" table) [1].
>
> Best,
> Jark
>
> [1]:
> https://docs.confluent.io/current/ksqldb/tutorials/basics-local.html#create-a-stream-and-table
>
>
> On Mon, 19 Oct 2020 at 18:17, Danny Chan <[email protected]> wrote:
>
>> The concept seems conflicts with the Flink abstraction “dynamic table”,
>> in Flink we see both “stream” and “table” as a dynamic table,
>>
>> I think we should make clear first how to express stream and table
>> specific features on one “dynamic table”,
>> it is more natural for KSQL because KSQL takes stream and table as
>> different abstractions for representing collections. In KSQL, only table is
>> mutable and can have a primary key.
>>
>> Does this connector belongs to the “table” scope or “stream” scope ?
>>
>> Some of the concepts (such as the primary key on stream) should be
>> suitable for all the connectors, not just Kafka, Shouldn’t this be an
>> extension of existing Kafka connector instead of a totally new connector ?
>> What about the other connectors ?
>>
>> Because this touches the core abstraction of Flink, we better have a
>> top-down overall design, following the KSQL directly is not the answer.
>>
>> P.S. For the source
>> > Shouldn’t this be an extension of existing Kafka connector instead of a
>> totally new connector ?
>>
>> How could we achieve that (e.g. set up the parallelism correctly) ?
>>
>> Best,
>> Danny Chan
>> 在 2020年10月19日 +0800 PM5:17，Jingsong Li <[email protected]>，写道：
>> > Thanks Shengkai for your proposal.
>> >
>> > +1 for this feature.
>> >
>> > > Future Work: Support bounded KTable source
>> >
>> > I don't think it should be a future work, I think it is one of the
>> > important concepts of this FLIP. We need to understand it now.
>> >
>> > Intuitively, a ktable in my opinion is a bounded table rather than a
>> > stream, so select should produce a bounded table by default.
>> >
>> > I think we can list Kafka related knowledge, because the word `ktable`
>> is
>> > easy to associate with ksql related concepts. (If possible, it's better
>> to
>> > unify with it)
>> >
>> > What do you think?
>> >
>> > > value.fields-include
>> >
>> > What about the default behavior of KSQL?
>> >
>> > Best,
>> > Jingsong
>> >
>> > On Mon, Oct 19, 2020 at 4:33 PM Shengkai Fang <[email protected]>
>> wrote:
>> >
>> > > Hi, devs.
>> > >
>> > > Jark and I want to start a new FLIP to introduce the KTable
>> connector. The
>> > > KTable is a shortcut of "Kafka Table", it also has the same semantics
>> with
>> > > the KTable notion in Kafka Stream.
>> > >
>> > > FLIP-149:
>> > >
>> > >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+KTable+Connector
>> > >
>> > > Currently many users have expressed their needs for the upsert Kafka
>> by
>> > > mail lists and issues. The KTable connector has several benefits for
>> users:
>> > >
>> > > 1. Users are able to interpret a compacted Kafka Topic as an upsert
>> stream
>> > > in Apache Flink. And also be able to write a changelog stream to Kafka
>> > > (into a compacted topic).
>> > > 2. As a part of the real time pipeline, store join or aggregate
>> result (may
>> > > contain updates) into a Kafka topic for further calculation;
>> > > 3. The semantic of the KTable connector is just the same as KTable in
>> Kafka
>> > > Stream. So it's very handy for Kafka Stream and KSQL users. We have
>> seen
>> > > several questions in the mailing list asking how to model a KTable
>> and how
>> > > to join a KTable in Flink SQL.
>> > >
>> > > We hope it can expand the usage of the Flink with Kafka.
>> > >
>> > > I'm looking forward to your feedback.
>> > >
>> > > Best,
>> > > Shengkai
>> > >
>> >
>> >
>> > --
>> > Best, Jingsong Lee
>>
>

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

Reply via email to