Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

Shengkai Fang Fri, 23 Oct 2020 00:02:30 -0700

Add one more message, I have already updated the FLIP[1].

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector


Shengkai Fang <[email protected]> 于2020年10月23日周五 下午2:55写道：

> Hi, all.
> It seems we have reached a consensus on the FLIP. If no one has other
> objections, I would like to start the vote for FLIP-149.
>
> Best,
> Shengkai
>
> Jingsong Li <[email protected]> 于2020年10月23日周五 下午2:25写道：
>
>> Thanks for explanation,
>>
>> I am OK for `upsert`. Yes, Its concept has been accepted by many systems.
>>
>> Best,
>> Jingsong
>>
>> On Fri, Oct 23, 2020 at 12:38 PM Jark Wu <[email protected]> wrote:
>>
>> > Hi Timo,
>> >
>> > I have some concerns about `kafka-cdc`,
>> > 1) cdc is an abbreviation of Change Data Capture which is commonly used
>> for
>> > databases, not for message queues.
>> > 2) usually, cdc produces full content of changelog, including
>> > UPDATE_BEFORE, however "upsert kafka" doesn't
>> > 3) `kafka-cdc` sounds like a natively support for `debezium-json`
>> format,
>> > however, it is not and even we don't want
>> >    "upsert kafka" to support "debezium-json"
>> >
>> >
>> > Hi Jingsong,
>> >
>> > I think the terminology of "upsert" is fine, because Kafka also uses
>> > "upsert" to define such behavior in their official documentation [1]:
>> >
>> > > a data record in a changelog stream is interpreted as an UPSERT aka
>> > INSERT/UPDATE
>> >
>> > Materialize uses the "UPSERT" keyword to define such behavior too [2].
>> > Users have been requesting such feature using "upsert kafka"
>> terminology in
>> > user mailing lists [3][4].
>> > Many other systems support "UPSERT" statement natively, such as impala
>> [5],
>> > SAP [6], Phoenix [7], Oracle NoSQL [8], etc..
>> >
>> > Therefore, I think we don't need to be afraid of introducing "upsert"
>> > terminology, it is widely accepted by users.
>> >
>> > Best,
>> > Jark
>> >
>> >
>> > [1]:
>> >
>> >
>> https://kafka.apache.org/20/documentation/streams/developer-guide/dsl-api.html#streams_concepts_ktable
>> > [2]:
>> >
>> >
>> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
>> > [3]:
>> >
>> >
>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/SQL-materialized-upsert-tables-td18482.html#a18503
>> > [4]:
>> >
>> >
>> http://apache-flink.147419.n8.nabble.com/Kafka-Sink-AppendStreamTableSink-doesn-t-support-consuming-update-changes-td5959.html
>> > [5]:
>> https://impala.apache.org/docs/build/html/topics/impala_upsert.html
>> > [6]:
>> >
>> >
>> https://help.sap.com/viewer/7c78579ce9b14a669c1f3295b0d8ca16/Cloud/en-US/ea8b6773be584203bcd99da76844c5ed.html
>> > [7]: https://phoenix.apache.org/atomic_upsert.html
>> > [8]:
>> >
>> >
>> https://docs.oracle.com/en/database/other-databases/nosql-database/18.3/sqlfornosql/adding-table-rows-using-insert-and-upsert-statements.html
>> >
>> > On Fri, 23 Oct 2020 at 10:36, Jingsong Li <[email protected]>
>> wrote:
>> >
>> > > The `kafka-cdc` looks good to me.
>> > > We can even give options to indicate whether to turn on compact,
>> because
>> > > compact is just an optimization?
>> > >
>> > > - ktable let me think about KSQL.
>> > > - kafka-compacted it is not just compacted, more than that, it still
>> has
>> > > the ability of CDC
>> > > - upsert-kafka , upsert is back, and I don't really want to see it
>> again
>> > > since we have CDC
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> > > On Fri, Oct 23, 2020 at 2:21 AM Timo Walther <[email protected]>
>> wrote:
>> > >
>> > > > Hi Jark,
>> > > >
>> > > > I would be fine with `connector=upsert-kafka`. Another idea would
>> be to
>> > > > align the name to other available Flink connectors [1]:
>> > > >
>> > > > `connector=kafka-cdc`.
>> > > >
>> > > > Regards,
>> > > > Timo
>> > > >
>> > > > [1] https://github.com/ververica/flink-cdc-connectors
>> > > >
>> > > > On 22.10.20 17:17, Jark Wu wrote:
>> > > > > Another name is "connector=upsert-kafka', I think this can solve
>> > Timo's
>> > > > > concern on the "compacted" word.
>> > > > >
>> > > > > Materialize also uses "ENVELOPE UPSERT" [1] keyword to identify
>> such
>> > > > kafka
>> > > > > sources.
>> > > > > I think "upsert" is a well-known terminology widely used in many
>> > > systems
>> > > > > and matches the
>> > > > >   behavior of how we handle the kafka messages.
>> > > > >
>> > > > > What do you think?
>> > > > >
>> > > > > Best,
>> > > > > Jark
>> > > > >
>> > > > > [1]:
>> > > > >
>> > > >
>> > >
>> >
>> https://materialize.io/docs/sql/create-source/text-kafka/#upsert-on-a-kafka-topic
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, 22 Oct 2020 at 22:53, Kurt Young <[email protected]>
>> wrote:
>> > > > >
>> > > > >> Good validation messages can't solve the broken user experience,
>> > > > especially
>> > > > >> that
>> > > > >> such update mode option will implicitly make half of current
>> kafka
>> > > > options
>> > > > >> invalid or doesn't
>> > > > >> make sense.
>> > > > >>
>> > > > >> Best,
>> > > > >> Kurt
>> > > > >>
>> > > > >>
>> > > > >> On Thu, Oct 22, 2020 at 10:31 PM Jark Wu <[email protected]>
>> wrote:
>> > > > >>
>> > > > >>> Hi Timo, Seth,
>> > > > >>>
>> > > > >>> The default value "inserting" of "mode" might be not suitable,
>> > > > >>> because "debezium-json" emits changelog messages which include
>> > > updates.
>> > > > >>>
>> > > > >>> On Thu, 22 Oct 2020 at 22:10, Seth Wiesman <[email protected]>
>> > > wrote:
>> > > > >>>
>> > > > >>>> +1 for supporting upsert results into Kafka.
>> > > > >>>>
>> > > > >>>> I have no comments on the implementation details.
>> > > > >>>>
>> > > > >>>> As far as configuration goes, I tend to favor Timo's option
>> where
>> > we
>> > > > >> add
>> > > > >>> a
>> > > > >>>> "mode" property to the existing Kafka table with default value
>> > > > >>> "inserting".
>> > > > >>>> If the mode is set to "updating" then the validation changes to
>> > the
>> > > > new
>> > > > >>>> requirements. I personally find it more intuitive than a
>> seperate
>> > > > >>>> connector, my fear is users won't understand its the same
>> physical
>> > > > >> kafka
>> > > > >>>> sink under the hood and it will lead to other confusion like
>> does
>> > it
>> > > > >>> offer
>> > > > >>>> the same persistence guarantees? I think we are capable of
>> adding
>> > > good
>> > > > >>>> valdiation messaging that solves Jark and Kurts concerns.
>> > > > >>>>
>> > > > >>>>
>> > > > >>>> On Thu, Oct 22, 2020 at 8:51 AM Timo Walther <
>> [email protected]>
>> > > > >> wrote:
>> > > > >>>>
>> > > > >>>>> Hi Jark,
>> > > > >>>>>
>> > > > >>>>> "calling it "kafka-compacted" can even remind users to enable
>> log
>> > > > >>>>> compaction"
>> > > > >>>>>
>> > > > >>>>> But sometimes users like to store a lineage of changes in
>> their
>> > > > >> topics.
>> > > > >>>>> Indepent of any ktable/kstream interpretation.
>> > > > >>>>>
>> > > > >>>>> I let the majority decide on this topic to not further block
>> this
>> > > > >>>>> effort. But we might find a better name like:
>> > > > >>>>>
>> > > > >>>>> connector = kafka
>> > > > >>>>> mode = updating/inserting
>> > > > >>>>>
>> > > > >>>>> OR
>> > > > >>>>>
>> > > > >>>>> connector = kafka-updating
>> > > > >>>>>
>> > > > >>>>> ...
>> > > > >>>>>
>> > > > >>>>> Regards,
>> > > > >>>>> Timo
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>> On 22.10.20 15:24, Jark Wu wrote:
>> > > > >>>>>> Hi Timo,
>> > > > >>>>>>
>> > > > >>>>>> Thanks for your opinions.
>> > > > >>>>>>
>> > > > >>>>>> 1) Implementation
>> > > > >>>>>> We will have an stateful operator to generate INSERT and
>> > > > >>> UPDATE_BEFORE.
>> > > > >>>>>> This operator is keyby-ed (primary key as the shuffle key)
>> after
>> > > > >> the
>> > > > >>>>> source
>> > > > >>>>>> operator.
>> > > > >>>>>> The implementation of this operator is very similar to the
>> > > existing
>> > > > >>>>>> `DeduplicateKeepLastRowFunction`.
>> > > > >>>>>> The operator will register a value state using the primary
>> key
>> > > > >> fields
>> > > > >>>> as
>> > > > >>>>>> keys.
>> > > > >>>>>> When the value state is empty under current key, we will emit
>> > > > >> INSERT
>> > > > >>>> for
>> > > > >>>>>> the input row.
>> > > > >>>>>> When the value state is not empty under current key, we will
>> > emit
>> > > > >>>>>> UPDATE_BEFORE using the row in state,
>> > > > >>>>>> and emit UPDATE_AFTER using the input row.
>> > > > >>>>>> When the input row is DELETE, we will clear state and emit
>> > DELETE
>> > > > >>> row.
>> > > > >>>>>>
>> > > > >>>>>> 2) new option vs new connector
>> > > > >>>>>>> We recently simplified the table options to a minimum
>> amount of
>> > > > >>>>>> characters to be as concise as possible in the DDL.
>> > > > >>>>>> I think this is the reason why we want to introduce a new
>> > > > >> connector,
>> > > > >>>>>> because we can simplify the options in DDL.
>> > > > >>>>>> For example, if using a new option, the DDL may look like
>> this:
>> > > > >>>>>>
>> > > > >>>>>> CREATE TABLE users (
>> > > > >>>>>>     user_id BIGINT,
>> > > > >>>>>>     user_name STRING,
>> > > > >>>>>>     user_level STRING,
>> > > > >>>>>>     region STRING,
>> > > > >>>>>>     PRIMARY KEY (user_id) NOT ENFORCED
>> > > > >>>>>> ) WITH (
>> > > > >>>>>>     'connector' = 'kafka',
>> > > > >>>>>>     'model' = 'table',
>> > > > >>>>>>     'topic' = 'pageviews_per_region',
>> > > > >>>>>>     'properties.bootstrap.servers' = '...',
>> > > > >>>>>>     'properties.group.id' = 'testGroup',
>> > > > >>>>>>     'scan.startup.mode' = 'earliest',
>> > > > >>>>>>     'key.format' = 'csv',
>> > > > >>>>>>     'key.fields' = 'user_id',
>> > > > >>>>>>     'value.format' = 'avro',
>> > > > >>>>>>     'sink.partitioner' = 'hash'
>> > > > >>>>>> );
>> > > > >>>>>>
>> > > > >>>>>> If using a new connector, we can have a different default
>> value
>> > > for
>> > > > >>> the
>> > > > >>>>>> options and remove unnecessary options,
>> > > > >>>>>> the DDL can look like this which is much more concise:
>> > > > >>>>>>
>> > > > >>>>>> CREATE TABLE pageviews_per_region (
>> > > > >>>>>>     user_id BIGINT,
>> > > > >>>>>>     user_name STRING,
>> > > > >>>>>>     user_level STRING,
>> > > > >>>>>>     region STRING,
>> > > > >>>>>>     PRIMARY KEY (user_id) NOT ENFORCED
>> > > > >>>>>> ) WITH (
>> > > > >>>>>>     'connector' = 'kafka-compacted',
>> > > > >>>>>>     'topic' = 'pageviews_per_region',
>> > > > >>>>>>     'properties.bootstrap.servers' = '...',
>> > > > >>>>>>     'key.format' = 'csv',
>> > > > >>>>>>     'value.format' = 'avro'
>> > > > >>>>>> );
>> > > > >>>>>>
>> > > > >>>>>>> When people read `connector=kafka-compacted` they might not
>> > know
>> > > > >>> that
>> > > > >>>> it
>> > > > >>>>>>> has ktable semantics. You don't need to enable log
>> compaction
>> > in
>> > > > >>> order
>> > > > >>>>>>> to use a KTable as far as I know.
>> > > > >>>>>> We don't need to let users know it has ktable semantics, as
>> > > > >>> Konstantin
>> > > > >>>>>> mentioned this may carry more implicit
>> > > > >>>>>> meaning than we want to imply here. I agree users don't need
>> to
>> > > > >>> enable
>> > > > >>>>> log
>> > > > >>>>>> compaction, but from the production perspective,
>> > > > >>>>>> log compaction should always be enabled if it is used in this
>> > > > >>> purpose.
>> > > > >>>>>> Calling it "kafka-compacted" can even remind users to enable
>> log
>> > > > >>>>> compaction.
>> > > > >>>>>>
>> > > > >>>>>> I don't agree to introduce "model = table/stream" option, or
>> > > > >>>>>> "connector=kafka-table",
>> > > > >>>>>> because this means we are introducing Table vs Stream concept
>> > from
>> > > > >>>> KSQL.
>> > > > >>>>>> However, we don't have such top-level concept in Flink SQL
>> now,
>> > > > >> this
>> > > > >>>> will
>> > > > >>>>>> further confuse users.
>> > > > >>>>>> In Flink SQL, all the things are STREAM, the differences are
>> > > > >> whether
>> > > > >>> it
>> > > > >>>>> is
>> > > > >>>>>> bounded or unbounded,
>> > > > >>>>>>    whether it is insert-only or changelog.
>> > > > >>>>>>
>> > > > >>>>>>
>> > > > >>>>>> Best,
>> > > > >>>>>> Jark
>> > > > >>>>>>
>> > > > >>>>>>
>> > > > >>>>>> On Thu, 22 Oct 2020 at 20:39, Timo Walther <
>> [email protected]>
>> > > > >>> wrote:
>> > > > >>>>>>
>> > > > >>>>>>> Hi Shengkai, Hi Jark,
>> > > > >>>>>>>
>> > > > >>>>>>> thanks for this great proposal. It is time to finally
>> connect
>> > the
>> > > > >>>>>>> changelog processor with a compacted Kafka topic.
>> > > > >>>>>>>
>> > > > >>>>>>> "The operator will produce INSERT rows, or additionally
>> > generate
>> > > > >>>>>>> UPDATE_BEFORE rows for the previous image, or produce DELETE
>> > rows
>> > > > >>> with
>> > > > >>>>>>> all columns filled with values."
>> > > > >>>>>>>
>> > > > >>>>>>> Could you elaborate a bit on the implementation details in
>> the
>> > > > >> FLIP?
>> > > > >>>> How
>> > > > >>>>>>> are UPDATE_BEFOREs are generated. How much state is
>> required to
>> > > > >>>> perform
>> > > > >>>>>>> this operation.
>> > > > >>>>>>>
>> > > > >>>>>>>    From a conceptual and semantical point of view, I'm fine
>> > with
>> > > > >> the
>> > > > >>>>>>> proposal. But I would like to share my opinion about how we
>> > > expose
>> > > > >>>> this
>> > > > >>>>>>> feature:
>> > > > >>>>>>>
>> > > > >>>>>>> ktable vs kafka-compacted
>> > > > >>>>>>>
>> > > > >>>>>>> I'm against having an additional connector like `ktable` or
>> > > > >>>>>>> `kafka-compacted`. We recently simplified the table options
>> to
>> > a
>> > > > >>>> minimum
>> > > > >>>>>>> amount of characters to be as concise as possible in the
>> DDL.
>> > > > >>>> Therefore,
>> > > > >>>>>>> I would keep the `connector=kafka` and introduce an
>> additional
>> > > > >>> option.
>> > > > >>>>>>> Because a user wants to read "from Kafka". And the "how"
>> should
>> > > be
>> > > > >>>>>>> determined in the lower options.
>> > > > >>>>>>>
>> > > > >>>>>>> When people read `connector=ktable` they might not know that
>> > this
>> > > > >> is
>> > > > >>>>>>> Kafka. Or they wonder where `kstream` is?
>> > > > >>>>>>>
>> > > > >>>>>>> When people read `connector=kafka-compacted` they might not
>> > know
>> > > > >>> that
>> > > > >>>> it
>> > > > >>>>>>> has ktable semantics. You don't need to enable log
>> compaction
>> > in
>> > > > >>> order
>> > > > >>>>>>> to use a KTable as far as I know. Log compaction and table
>> > > > >> semantics
>> > > > >>>> are
>> > > > >>>>>>> orthogonal topics.
>> > > > >>>>>>>
>> > > > >>>>>>> In the end we will need 3 types of information when
>> declaring a
>> > > > >>> Kafka
>> > > > >>>>>>> connector:
>> > > > >>>>>>>
>> > > > >>>>>>> CREATE TABLE ... WITH (
>> > > > >>>>>>>      connector=kafka        -- Some information about the
>> > > connector
>> > > > >>>>>>>      end-offset = XXXX      -- Some information about the
>> > > > >> boundedness
>> > > > >>>>>>>      model = table/stream   -- Some information about
>> > > > >> interpretation
>> > > > >>>>>>> )
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> We can still apply all the constraints mentioned in the
>> FLIP.
>> > > When
>> > > > >>>>>>> `model` is set to `table`.
>> > > > >>>>>>>
>> > > > >>>>>>> What do you think?
>> > > > >>>>>>>
>> > > > >>>>>>> Regards,
>> > > > >>>>>>> Timo
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>> On 21.10.20 14:19, Jark Wu wrote:
>> > > > >>>>>>>> Hi,
>> > > > >>>>>>>>
>> > > > >>>>>>>> IMO, if we are going to mix them in one connector,
>> > > > >>>>>>>> 1) either users need to set some options to a specific
>> value
>> > > > >>>>> explicitly,
>> > > > >>>>>>>> e.g. "scan.startup.mode=earliest", "sink.partitioner=hash",
>> > > etc..
>> > > > >>>>>>>> This makes the connector awkward to use. Users may face to
>> fix
>> > > > >>>> options
>> > > > >>>>>>> one
>> > > > >>>>>>>> by one according to the exception.
>> > > > >>>>>>>> Besides, in the future, it is still possible to use
>> > > > >>>>>>>> "sink.partitioner=fixed" (reduce network cost) if users are
>> > > aware
>> > > > >>> of
>> > > > >>>>>>>> the partition routing,
>> > > > >>>>>>>> however, it's error-prone to have "fixed" as default for
>> > > > >> compacted
>> > > > >>>>> mode.
>> > > > >>>>>>>>
>> > > > >>>>>>>> 2) or make those options a different default value when
>> > > > >>>>> "compacted=true".
>> > > > >>>>>>>> This would be more confusing and unpredictable if the
>> default
>> > > > >> value
>> > > > >>>> of
>> > > > >>>>>>>> options will change according to other options.
>> > > > >>>>>>>> What happens if we have a third mode in the future?
>> > > > >>>>>>>>
>> > > > >>>>>>>> In terms of usage and options, it's very different from the
>> > > > >>>>>>>> original "kafka" connector.
>> > > > >>>>>>>> It would be more handy to use and less fallible if
>> separating
>> > > > >> them
>> > > > >>>> into
>> > > > >>>>>>> two
>> > > > >>>>>>>> connectors.
>> > > > >>>>>>>> In the implementation layer, we can reuse code as much as
>> > > > >> possible.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Therefore, I'm still +1 to have a new connector.
>> > > > >>>>>>>> The "kafka-compacted" name sounds good to me.
>> > > > >>>>>>>>
>> > > > >>>>>>>> Best,
>> > > > >>>>>>>> Jark
>> > > > >>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>> On Wed, 21 Oct 2020 at 17:58, Konstantin Knauf <
>> > > > >> [email protected]>
>> > > > >>>>>>> wrote:
>> > > > >>>>>>>>
>> > > > >>>>>>>>> Hi Kurt, Hi Shengkai,
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> thanks for answering my questions and the additional
>> > > > >>>> clarifications. I
>> > > > >>>>>>>>> don't have a strong opinion on whether to extend the
>> "kafka"
>> > > > >>>> connector
>> > > > >>>>>>> or
>> > > > >>>>>>>>> to introduce a new connector. So, from my perspective feel
>> > free
>> > > > >> to
>> > > > >>>> go
>> > > > >>>>>>> with
>> > > > >>>>>>>>> a separate connector. If we do introduce a new connector I
>> > > > >>> wouldn't
>> > > > >>>>>>> call it
>> > > > >>>>>>>>> "ktable" for aforementioned reasons (In addition, we might
>> > > > >> suggest
>> > > > >>>>> that
>> > > > >>>>>>>>> there is also a "kstreams" connector for symmetry
>> reasons). I
>> > > > >>> don't
>> > > > >>>>>>> have a
>> > > > >>>>>>>>> good alternative name, though, maybe "kafka-compacted" or
>> > > > >>>>>>>>> "compacted-kafka".
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Thanks,
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Konstantin
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> On Wed, Oct 21, 2020 at 4:43 AM Kurt Young <
>> [email protected]
>> > >
>> > > > >>>> wrote:
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>> Hi all,
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> I want to describe the discussion process which drove us
>> to
>> > > > >> have
>> > > > >>>> such
>> > > > >>>>>>>>>> conclusion, this might make some of
>> > > > >>>>>>>>>> the design choices easier to understand and keep
>> everyone on
>> > > > >> the
>> > > > >>>> same
>> > > > >>>>>>>>> page.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Back to the motivation, what functionality do we want to
>> > > > >> provide
>> > > > >>> in
>> > > > >>>>> the
>> > > > >>>>>>>>>> first place? We got a lot of feedback and
>> > > > >>>>>>>>>> questions from mailing lists that people want to write
>> > > > >>>>> Not-Insert-Only
>> > > > >>>>>>>>>> messages into kafka. They might be
>> > > > >>>>>>>>>> intentional or by accident, e.g. wrote an non-windowed
>> > > > >> aggregate
>> > > > >>>>> query
>> > > > >>>>>>> or
>> > > > >>>>>>>>>> non-windowed left outer join. And
>> > > > >>>>>>>>>> some users from KSQL world also asked about why Flink
>> didn't
>> > > > >>>> leverage
>> > > > >>>>>>> the
>> > > > >>>>>>>>>> Key concept of every kafka topic
>> > > > >>>>>>>>>> and make kafka as a dynamic changing keyed table.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> To work with kafka better, we were thinking to extend the
>> > > > >>>>> functionality
>> > > > >>>>>>>>> of
>> > > > >>>>>>>>>> the current kafka connector by letting it
>> > > > >>>>>>>>>> accept updates and deletions. But due to the limitation
>> of
>> > > > >> kafka,
>> > > > >>>> the
>> > > > >>>>>>>>>> update has to be "update by key", aka a table
>> > > > >>>>>>>>>> with primary key.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> This introduces a couple of conflicts with current kafka
>> > > > >> table's
>> > > > >>>>>>> options:
>> > > > >>>>>>>>>> 1. key.fields: as said above, we need the kafka table to
>> > have
>> > > > >> the
>> > > > >>>>>>> primary
>> > > > >>>>>>>>>> key constraint. And users can also configure
>> > > > >>>>>>>>>> key.fields freely, this might cause friction. (Sure we
>> can
>> > do
>> > > > >>> some
>> > > > >>>>>>> sanity
>> > > > >>>>>>>>>> check on this but it also creates friction.)
>> > > > >>>>>>>>>> 2. sink.partitioner: to make the semantics right, we
>> need to
>> > > > >> make
>> > > > >>>>> sure
>> > > > >>>>>>>>> all
>> > > > >>>>>>>>>> the updates on the same key are written to
>> > > > >>>>>>>>>> the same kafka partition, such we should force to use a
>> hash
>> > > by
>> > > > >>> key
>> > > > >>>>>>>>>> partition inside such table. Again, this has conflicts
>> > > > >>>>>>>>>> and creates friction with current user options.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> The above things are solvable, though not perfect or most
>> > user
>> > > > >>>>>>> friendly.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Let's take a look at the reading side. The keyed kafka
>> table
>> > > > >>>> contains
>> > > > >>>>>>> two
>> > > > >>>>>>>>>> kinds of messages: upsert or deletion. What upsert
>> > > > >>>>>>>>>> means is "If the key doesn't exist yet, it's an insert
>> > record.
>> > > > >>>>>>> Otherwise
>> > > > >>>>>>>>>> it's an update record". For the sake of correctness or
>> > > > >>>>>>>>>> simplicity, the Flink SQL engine also needs such
>> > information.
>> > > > >> If
>> > > > >>> we
>> > > > >>>>>>>>>> interpret all messages to "update record", some queries
>> or
>> > > > >>>>>>>>>> operators may not work properly. It's weird to see an
>> update
>> > > > >>> record
>> > > > >>>>> but
>> > > > >>>>>>>>> you
>> > > > >>>>>>>>>> haven't seen the insert record before.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> So what Flink should do is after reading out the records
>> > from
>> > > > >>> such
>> > > > >>>>>>> table,
>> > > > >>>>>>>>>> it needs to create a state to record which messages have
>> > > > >>>>>>>>>> been seen and then generate the correct row type
>> > > > >> correspondingly.
>> > > > >>>>> This
>> > > > >>>>>>>>> kind
>> > > > >>>>>>>>>> of couples the state and the data of the message
>> > > > >>>>>>>>>> queue, and it also creates conflicts with current kafka
>> > > > >>> connector.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Think about if users suspend a running job (which
>> contains
>> > > some
>> > > > >>>>> reading
>> > > > >>>>>>>>>> state now), and then change the start offset of the
>> reader.
>> > > > >>>>>>>>>> By changing the reading offset, it actually change the
>> whole
>> > > > >>> story
>> > > > >>>> of
>> > > > >>>>>>>>>> "which records should be insert messages and which
>> records
>> > > > >>>>>>>>>> should be update messages). And it will also make Flink
>> to
>> > > deal
>> > > > >>>> with
>> > > > >>>>>>>>>> another weird situation that it might receive a deletion
>> > > > >>>>>>>>>> on a non existing message.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> We were unsatisfied with all the frictions and conflicts
>> it
>> > > > >> will
>> > > > >>>>> create
>> > > > >>>>>>>>> if
>> > > > >>>>>>>>>> we enable the "upsert & deletion" support to the current
>> > kafka
>> > > > >>>>>>>>>> connector. And later we begin to realize that we
>> shouldn't
>> > > > >> treat
>> > > > >>> it
>> > > > >>>>> as
>> > > > >>>>>>> a
>> > > > >>>>>>>>>> normal message queue, but should treat it as a changing
>> > keyed
>> > > > >>>>>>>>>> table. We should be able to always get the whole data of
>> > such
>> > > > >>> table
>> > > > >>>>> (by
>> > > > >>>>>>>>>> disabling the start offset option) and we can also read
>> the
>> > > > >>>>>>>>>> changelog out of such table. It's like a HBase table with
>> > > > >> binlog
>> > > > >>>>>>> support
>> > > > >>>>>>>>>> but doesn't have random access capability (which can be
>> > > > >> fulfilled
>> > > > >>>>>>>>>> by Flink's state).
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> So our intention was instead of telling and persuading
>> users
>> > > > >> what
>> > > > >>>>> kind
>> > > > >>>>>>> of
>> > > > >>>>>>>>>> options they should or should not use by extending
>> > > > >>>>>>>>>> current kafka connector when enable upsert support, we
>> are
>> > > > >>> actually
>> > > > >>>>>>>>> create
>> > > > >>>>>>>>>> a whole new and different connector that has total
>> > > > >>>>>>>>>> different abstractions in SQL layer, and should be
>> treated
>> > > > >>> totally
>> > > > >>>>>>>>>> different with current kafka connector.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Hope this can clarify some of the concerns.
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> Best,
>> > > > >>>>>>>>>> Kurt
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>> On Tue, Oct 20, 2020 at 5:20 PM Shengkai Fang <
>> > > > >> [email protected]
>> > > > >>>>
>> > > > >>>>>>> wrote:
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>>> Hi devs,
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> As many people are still confused about the difference
>> > option
>> > > > >>>>>>>>> behaviours
>> > > > >>>>>>>>>>> between the Kafka connector and KTable connector, Jark
>> and
>> > I
>> > > > >>> list
>> > > > >>>>> the
>> > > > >>>>>>>>>>> differences in the doc[1].
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> Best,
>> > > > >>>>>>>>>>> Shengkai
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> [1]
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://docs.google.com/document/d/13oAWAwQez0lZLsyfV21BfTEze1fc2cz4AZKiNOyBNPk/edit
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>> Shengkai Fang <[email protected]> 于2020年10月20日周二
>> > 下午12:05写道：
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>>> Hi Konstantin,
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Thanks for your reply.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> It uses the "kafka" connector and does not specify a
>> > > primary
>> > > > >>>> key.
>> > > > >>>>>>>>>>>> The dimensional table `users` is a ktable connector
>> and we
>> > > > >> can
>> > > > >>>>>>>>> specify
>> > > > >>>>>>>>>>> the
>> > > > >>>>>>>>>>>> pk on the KTable.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Will it possible to use a "ktable" as a dimensional
>> table
>> > > in
>> > > > >>>>>>>>> FLIP-132
>> > > > >>>>>>>>>>>> Yes. We can specify the watermark on the KTable and it
>> can
>> > > be
>> > > > >>>> used
>> > > > >>>>>>>>> as a
>> > > > >>>>>>>>>>>> dimension table in temporal join.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Introduce a new connector vs introduce a new property
>> > > > >>>>>>>>>>>> The main reason behind is that the KTable connector
>> almost
>> > > > >> has
>> > > > >>> no
>> > > > >>>>>>>>>> common
>> > > > >>>>>>>>>>>> options with the Kafka connector. The options that can
>> be
>> > > > >>> reused
>> > > > >>>> by
>> > > > >>>>>>>>>>> KTable
>> > > > >>>>>>>>>>>> connectors are 'topic', 'properties.bootstrap.servers'
>> and
>> > > > >>>>>>>>>>>> 'value.fields-include' . We can't set cdc format for
>> > > > >>> 'key.format'
>> > > > >>>>> and
>> > > > >>>>>>>>>>>> 'value.format' in KTable connector now, which is
>> > available
>> > > > >> in
>> > > > >>>>> Kafka
>> > > > >>>>>>>>>>>> connector. Considering the difference between the
>> options
>> > we
>> > > > >>> can
>> > > > >>>>> use,
>> > > > >>>>>>>>>>> it's
>> > > > >>>>>>>>>>>> more suitable to introduce an another connector rather
>> > than
>> > > a
>> > > > >>>>>>>>> property.
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> We are also fine to use "compacted-kafka" as the name
>> of
>> > the
>> > > > >>> new
>> > > > >>>>>>>>>>>> connector. What do you think?
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>> Shengkai
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>> Konstantin Knauf <[email protected]> 于2020年10月19日周一
>> > > > >> 下午10:15写道：
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Hi Shengkai,
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Thank you for driving this effort. I believe this a
>> very
>> > > > >>>> important
>> > > > >>>>>>>>>>> feature
>> > > > >>>>>>>>>>>>> for many users who use Kafka and Flink SQL together. A
>> > few
>> > > > >>>>> questions
>> > > > >>>>>>>>>> and
>> > > > >>>>>>>>>>>>> thoughts:
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> * Is your example "Use KTable as a reference/dimension
>> > > > >> table"
>> > > > >>>>>>>>> correct?
>> > > > >>>>>>>>>>> It
>> > > > >>>>>>>>>>>>> uses the "kafka" connector and does not specify a
>> primary
>> > > > >> key.
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> * Will it be possible to use a "ktable" table directly
>> > as a
>> > > > >>>>>>>>>> dimensional
>> > > > >>>>>>>>>>>>> table in temporal join (*based on event time*)
>> > (FLIP-132)?
>> > > > >>> This
>> > > > >>>> is
>> > > > >>>>>>>>> not
>> > > > >>>>>>>>>>>>> completely clear to me from the FLIP.
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> * I'd personally prefer not to introduce a new
>> connector
>> > > and
>> > > > >>>>> instead
>> > > > >>>>>>>>>> to
>> > > > >>>>>>>>>>>>> extend the Kafka connector. We could add an additional
>> > > > >>> property
>> > > > >>>>>>>>>>>>> "compacted"
>> > > > >>>>>>>>>>>>> = "true"|"false". If it is set to "true", we can add
>> > > > >>> additional
>> > > > >>>>>>>>>>> validation
>> > > > >>>>>>>>>>>>> logic (e.g. "scan.startup.mode" can not be set,
>> primary
>> > key
>> > > > >>>>>>>>> required,
>> > > > >>>>>>>>>>>>> etc.). If we stick to a separate connector I'd not
>> call
>> > it
>> > > > >>>>> "ktable",
>> > > > >>>>>>>>>> but
>> > > > >>>>>>>>>>>>> rather "compacted-kafka" or similar. KTable seems to
>> > carry
>> > > > >>> more
>> > > > >>>>>>>>>> implicit
>> > > > >>>>>>>>>>>>> meaning than we want to imply here.
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> * I agree that this is not a bounded source. If we
>> want
>> > to
>> > > > >>>>> support a
>> > > > >>>>>>>>>>>>> bounded mode, this is an orthogonal concern that also
>> > > > >> applies
>> > > > >>> to
>> > > > >>>>>>>>> other
>> > > > >>>>>>>>>>>>> unbounded sources.
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Konstantin
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> On Mon, Oct 19, 2020 at 3:26 PM Jark Wu <
>> > [email protected]>
>> > > > >>>> wrote:
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> Hi Danny,
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> First of all, we didn't introduce any concepts from
>> KSQL
>> > > > >>> (e.g.
>> > > > >>>>>>>>>> Stream
>> > > > >>>>>>>>>>> vs
>> > > > >>>>>>>>>>>>>> Table notion).
>> > > > >>>>>>>>>>>>>> This new connector will produce a changelog stream,
>> so
>> > > it's
>> > > > >>>> still
>> > > > >>>>>>>>> a
>> > > > >>>>>>>>>>>>> dynamic
>> > > > >>>>>>>>>>>>>> table and doesn't conflict with Flink core concepts.
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> The "ktable" is just a connector name, we can also
>> call
>> > it
>> > > > >>>>>>>>>>>>>> "compacted-kafka" or something else.
>> > > > >>>>>>>>>>>>>> Calling it "ktable" is just because KSQL users can
>> > migrate
>> > > > >> to
>> > > > >>>>>>>>> Flink
>> > > > >>>>>>>>>>> SQL
>> > > > >>>>>>>>>>>>>> easily.
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> Regarding to why introducing a new connector vs a new
>> > > > >>> property
>> > > > >>>> in
>> > > > >>>>>>>>>>>>> existing
>> > > > >>>>>>>>>>>>>> kafka connector:
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> I think the main reason is that we want to have a
>> clear
>> > > > >>>>> separation
>> > > > >>>>>>>>>> for
>> > > > >>>>>>>>>>>>> such
>> > > > >>>>>>>>>>>>>> two use cases, because they are very different.
>> > > > >>>>>>>>>>>>>> We also listed reasons in the FLIP, including:
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> 1) It's hard to explain what's the behavior when
>> users
>> > > > >>> specify
>> > > > >>>>> the
>> > > > >>>>>>>>>>> start
>> > > > >>>>>>>>>>>>>> offset from a middle position (e.g. how to process
>> non
>> > > > >> exist
>> > > > >>>>>>>>> delete
>> > > > >>>>>>>>>>>>>> events).
>> > > > >>>>>>>>>>>>>>        It's dangerous if users do that. So we don't
>> > > provide
>> > > > >>> the
>> > > > >>>>>>>>> offset
>> > > > >>>>>>>>>>>>> option
>> > > > >>>>>>>>>>>>>> in the new connector at the moment.
>> > > > >>>>>>>>>>>>>> 2) It's a different perspective/abstraction on the
>> same
>> > > > >> kafka
>> > > > >>>>>>>>> topic
>> > > > >>>>>>>>>>>>> (append
>> > > > >>>>>>>>>>>>>> vs. upsert). It would be easier to understand if we
>> can
>> > > > >>>> separate
>> > > > >>>>>>>>>> them
>> > > > >>>>>>>>>>>>>>        instead of mixing them in one connector. The
>> new
>> > > > >>>> connector
>> > > > >>>>>>>>>>> requires
>> > > > >>>>>>>>>>>>>> hash sink partitioner, primary key declared, regular
>> > > > >> format.
>> > > > >>>>>>>>>>>>>>        If we mix them in one connector, it might be
>> > > > >> confusing
>> > > > >>>> how
>> > > > >>>>> to
>> > > > >>>>>>>>>> use
>> > > > >>>>>>>>>>>>> the
>> > > > >>>>>>>>>>>>>> options correctly.
>> > > > >>>>>>>>>>>>>> 3) The semantic of the KTable connector is just the
>> same
>> > > as
>> > > > >>>>> KTable
>> > > > >>>>>>>>>> in
>> > > > >>>>>>>>>>>>> Kafka
>> > > > >>>>>>>>>>>>>> Stream. So it's very handy for Kafka Stream and KSQL
>> > > users.
>> > > > >>>>>>>>>>>>>>        We have seen several questions in the mailing
>> > list
>> > > > >>> asking
>> > > > >>>>> how
>> > > > >>>>>>>>> to
>> > > > >>>>>>>>>>>>> model
>> > > > >>>>>>>>>>>>>> a KTable and how to join a KTable in Flink SQL.
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>> Jark
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>> On Mon, 19 Oct 2020 at 19:53, Jark Wu <
>> [email protected]
>> > >
>> > > > >>>> wrote:
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> Hi Jingsong,
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> As the FLIP describes, "KTable connector produces a
>> > > > >>> changelog
>> > > > >>>>>>>>>>> stream,
>> > > > >>>>>>>>>>>>>>> where each data record represents an update or
>> delete
>> > > > >>> event.".
>> > > > >>>>>>>>>>>>>>> Therefore, a ktable source is an unbounded stream
>> > source.
>> > > > >>>>>>>>>> Selecting
>> > > > >>>>>>>>>>> a
>> > > > >>>>>>>>>>>>>>> ktable source is similar to selecting a kafka source
>> > with
>> > > > >>>>>>>>>>>>> debezium-json
>> > > > >>>>>>>>>>>>>>> format
>> > > > >>>>>>>>>>>>>>> that it never ends and the results are continuously
>> > > > >> updated.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> It's possible to have a bounded ktable source in the
>> > > > >> future,
>> > > > >>>> for
>> > > > >>>>>>>>>>>>> example,
>> > > > >>>>>>>>>>>>>>> add an option 'bounded=true' or 'end-offset=xxx'.
>> > > > >>>>>>>>>>>>>>> In this way, the ktable will produce a bounded
>> > changelog
>> > > > >>>> stream.
>> > > > >>>>>>>>>>>>>>> So I think this can be a compatible feature in the
>> > > future.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> I don't think we should associate with ksql related
>> > > > >>> concepts.
>> > > > >>>>>>>>>>>>> Actually,
>> > > > >>>>>>>>>>>>>> we
>> > > > >>>>>>>>>>>>>>> didn't introduce any concepts from KSQL (e.g.
>> Stream vs
>> > > > >>> Table
>> > > > >>>>>>>>>>> notion).
>> > > > >>>>>>>>>>>>>>> The "ktable" is just a connector name, we can also
>> call
>> > > it
>> > > > >>>>>>>>>>>>>>> "compacted-kafka" or something else.
>> > > > >>>>>>>>>>>>>>> Calling it "ktable" is just because KSQL users can
>> > > migrate
>> > > > >>> to
>> > > > >>>>>>>>>> Flink
>> > > > >>>>>>>>>>>>> SQL
>> > > > >>>>>>>>>>>>>>> easily.
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> Regarding the "value.fields-include", this is an
>> option
>> > > > >>>>>>>>> introduced
>> > > > >>>>>>>>>>> in
>> > > > >>>>>>>>>>>>>>> FLIP-107 for Kafka connector.
>> > > > >>>>>>>>>>>>>>> I think we should keep the same behavior with the
>> Kafka
>> > > > >>>>>>>>> connector.
>> > > > >>>>>>>>>>> I'm
>> > > > >>>>>>>>>>>>>> not
>> > > > >>>>>>>>>>>>>>> sure what's the default behavior of KSQL.
>> > > > >>>>>>>>>>>>>>> But I guess it also stores the keys in value from
>> this
>> > > > >>> example
>> > > > >>>>>>>>>> docs
>> > > > >>>>>>>>>>>>> (see
>> > > > >>>>>>>>>>>>>>> the "users_original" table) [1].
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>>> Jark
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> [1]:
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://docs.confluent.io/current/ksqldb/tutorials/basics-local.html#create-a-stream-and-table
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>> On Mon, 19 Oct 2020 at 18:17, Danny Chan <
>> > > > >>>> [email protected]>
>> > > > >>>>>>>>>>>>> wrote:
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> The concept seems conflicts with the Flink
>> abstraction
>> > > > >>>> “dynamic
>> > > > >>>>>>>>>>>>> table”,
>> > > > >>>>>>>>>>>>>>>> in Flink we see both “stream” and “table” as a
>> dynamic
>> > > > >>> table,
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> I think we should make clear first how to express
>> > stream
>> > > > >>> and
>> > > > >>>>>>>>>> table
>> > > > >>>>>>>>>>>>>>>> specific features on one “dynamic table”,
>> > > > >>>>>>>>>>>>>>>> it is more natural for KSQL because KSQL takes
>> stream
>> > > and
>> > > > >>>> table
>> > > > >>>>>>>>>> as
>> > > > >>>>>>>>>>>>>>>> different abstractions for representing
>> collections.
>> > In
>> > > > >>> KSQL,
>> > > > >>>>>>>>>> only
>> > > > >>>>>>>>>>>>>> table is
>> > > > >>>>>>>>>>>>>>>> mutable and can have a primary key.
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Does this connector belongs to the “table” scope or
>> > > > >>> “stream”
>> > > > >>>>>>>>>> scope
>> > > > >>>>>>>>>>> ?
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Some of the concepts (such as the primary key on
>> > stream)
>> > > > >>>> should
>> > > > >>>>>>>>>> be
>> > > > >>>>>>>>>>>>>>>> suitable for all the connectors, not just Kafka,
>> > > > >> Shouldn’t
>> > > > >>>> this
>> > > > >>>>>>>>>> be
>> > > > >>>>>>>>>>> an
>> > > > >>>>>>>>>>>>>>>> extension of existing Kafka connector instead of a
>> > > > >> totally
>> > > > >>>> new
>> > > > >>>>>>>>>>>>>> connector ?
>> > > > >>>>>>>>>>>>>>>> What about the other connectors ?
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Because this touches the core abstraction of
>> Flink, we
>> > > > >>> better
>> > > > >>>>>>>>>> have
>> > > > >>>>>>>>>>> a
>> > > > >>>>>>>>>>>>>>>> top-down overall design, following the KSQL
>> directly
>> > is
>> > > > >> not
>> > > > >>>> the
>> > > > >>>>>>>>>>>>> answer.
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> P.S. For the source
>> > > > >>>>>>>>>>>>>>>>> Shouldn’t this be an extension of existing Kafka
>> > > > >> connector
>> > > > >>>>>>>>>>> instead
>> > > > >>>>>>>>>>>>> of
>> > > > >>>>>>>>>>>>>> a
>> > > > >>>>>>>>>>>>>>>> totally new connector ?
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> How could we achieve that (e.g. set up the
>> parallelism
>> > > > >>>>>>>>>> correctly) ?
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>>>> Danny Chan
>> > > > >>>>>>>>>>>>>>>> 在 2020年10月19日 +0800 PM5:17，Jingsong Li <
>> > > > >>>> [email protected]
>> > > > >>>>>>>>>>>> ，写道：
>> > > > >>>>>>>>>>>>>>>>> Thanks Shengkai for your proposal.
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> +1 for this feature.
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> Future Work: Support bounded KTable source
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> I don't think it should be a future work, I think
>> it
>> > is
>> > > > >>> one
>> > > > >>>>>>>>> of
>> > > > >>>>>>>>>>> the
>> > > > >>>>>>>>>>>>>>>>> important concepts of this FLIP. We need to
>> > understand
>> > > > >> it
>> > > > >>>>>>>>> now.
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> Intuitively, a ktable in my opinion is a bounded
>> > table
>> > > > >>>> rather
>> > > > >>>>>>>>>>> than
>> > > > >>>>>>>>>>>>> a
>> > > > >>>>>>>>>>>>>>>>> stream, so select should produce a bounded table
>> by
>> > > > >>> default.
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> I think we can list Kafka related knowledge,
>> because
>> > > the
>> > > > >>>> word
>> > > > >>>>>>>>>>>>> `ktable`
>> > > > >>>>>>>>>>>>>>>> is
>> > > > >>>>>>>>>>>>>>>>> easy to associate with ksql related concepts. (If
>> > > > >>> possible,
>> > > > >>>>>>>>>> it's
>> > > > >>>>>>>>>>>>>> better
>> > > > >>>>>>>>>>>>>>>> to
>> > > > >>>>>>>>>>>>>>>>> unify with it)
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> What do you think?
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> value.fields-include
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> What about the default behavior of KSQL?
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>>>>> Jingsong
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> On Mon, Oct 19, 2020 at 4:33 PM Shengkai Fang <
>> > > > >>>>>>>>>> [email protected]
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>> wrote:
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> Hi, devs.
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> Jark and I want to start a new FLIP to introduce
>> the
>> > > > >>> KTable
>> > > > >>>>>>>>>>>>>>>> connector. The
>> > > > >>>>>>>>>>>>>>>>>> KTable is a shortcut of "Kafka Table", it also
>> has
>> > the
>> > > > >>> same
>> > > > >>>>>>>>>>>>>> semantics
>> > > > >>>>>>>>>>>>>>>> with
>> > > > >>>>>>>>>>>>>>>>>> the KTable notion in Kafka Stream.
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> FLIP-149:
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > >
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+KTable+Connector
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> Currently many users have expressed their needs
>> for
>> > > the
>> > > > >>>>>>>>>> upsert
>> > > > >>>>>>>>>>>>> Kafka
>> > > > >>>>>>>>>>>>>>>> by
>> > > > >>>>>>>>>>>>>>>>>> mail lists and issues. The KTable connector has
>> > > several
>> > > > >>>>>>>>>>> benefits
>> > > > >>>>>>>>>>>>> for
>> > > > >>>>>>>>>>>>>>>> users:
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> 1. Users are able to interpret a compacted Kafka
>> > Topic
>> > > > >> as
>> > > > >>>>>>>>> an
>> > > > >>>>>>>>>>>>> upsert
>> > > > >>>>>>>>>>>>>>>> stream
>> > > > >>>>>>>>>>>>>>>>>> in Apache Flink. And also be able to write a
>> > changelog
>> > > > >>>>>>>>> stream
>> > > > >>>>>>>>>>> to
>> > > > >>>>>>>>>>>>>> Kafka
>> > > > >>>>>>>>>>>>>>>>>> (into a compacted topic).
>> > > > >>>>>>>>>>>>>>>>>> 2. As a part of the real time pipeline, store
>> join
>> > or
>> > > > >>>>>>>>>> aggregate
>> > > > >>>>>>>>>>>>>>>> result (may
>> > > > >>>>>>>>>>>>>>>>>> contain updates) into a Kafka topic for further
>> > > > >>>>>>>>> calculation;
>> > > > >>>>>>>>>>>>>>>>>> 3. The semantic of the KTable connector is just
>> the
>> > > > >> same
>> > > > >>> as
>> > > > >>>>>>>>>>>>> KTable
>> > > > >>>>>>>>>>>>>> in
>> > > > >>>>>>>>>>>>>>>> Kafka
>> > > > >>>>>>>>>>>>>>>>>> Stream. So it's very handy for Kafka Stream and
>> KSQL
>> > > > >>> users.
>> > > > >>>>>>>>>> We
>> > > > >>>>>>>>>>>>> have
>> > > > >>>>>>>>>>>>>>>> seen
>> > > > >>>>>>>>>>>>>>>>>> several questions in the mailing list asking how
>> to
>> > > > >>> model a
>> > > > >>>>>>>>>>>>> KTable
>> > > > >>>>>>>>>>>>>>>> and how
>> > > > >>>>>>>>>>>>>>>>>> to join a KTable in Flink SQL.
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> We hope it can expand the usage of the Flink with
>> > > > >> Kafka.
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> I'm looking forward to your feedback.
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>> Best,
>> > > > >>>>>>>>>>>>>>>>>> Shengkai
>> > > > >>>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>>> --
>> > > > >>>>>>>>>>>>>>>>> Best, Jingsong Lee
>> > > > >>>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> --
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> Konstantin Knauf
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> https://twitter.com/snntrable
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>> https://github.com/knaufk
>> > > > >>>>>>>>>>>>>
>> > > > >>>>>>>>>>>>
>> > > > >>>>>>>>>>>
>> > > > >>>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> --
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> Konstantin Knauf
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> https://twitter.com/snntrable
>> > > > >>>>>>>>>
>> > > > >>>>>>>>> https://github.com/knaufk
>> > > > >>>>>>>>>
>> > > > >>>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>>
>> > > > >>>>>>
>> > > > >>>>>
>> > > > >>>>>
>> > > > >>>>
>> > > > >>>> --
>> > > > >>>>
>> > > > >>>> Seth Wiesman | Solutions Architect
>> > > > >>>>
>> > > > >>>> +1 314 387 1463
>> > > > >>>>
>> > > > >>>> <https://www.ververica.com/>
>> > > > >>>>
>> > > > >>>> Follow us @VervericaData
>> > > > >>>>
>> > > > >>>> --
>> > > > >>>>
>> > > > >>>> Join Flink Forward <https://flink-forward.org/> - The Apache
>> > Flink
>> > > > >>>> Conference
>> > > > >>>>
>> > > > >>>> Stream Processing | Event Driven | Real Time
>> > > > >>>>
>> > > > >>>
>> > > > >>
>> > > > >
>> > > >
>> > > >
>> > >
>> > > --
>> > > Best, Jingsong Lee
>> > >
>> >
>>
>>
>> --
>> Best, Jingsong Lee
>>
>

Re: [DISCUSS] FLIP-149: Introduce the KTable Connector

Reply via email to