I see, I understand what you mean is avoiding the loss of historical data
Logically, another option is never clean up, so don't have to turn on
compact
I am OK with the implementation, It's that feeling shouldn't be a logical
limitation
Best,
Jingsong
On Fri, Oct 23, 2020 at 4:09 PM Kurt Young
To be precise, it means the Kakfa topic should set the configuration
"cleanup.policy" to "compact" not "delete".
Best,
Kurt
On Fri, Oct 23, 2020 at 4:01 PM Jingsong Li wrote:
> I just notice there is a limitation in the FLIP:
>
> > Generally speaking, the underlying topic of the upsert-kafka s
I just notice there is a limitation in the FLIP:
> Generally speaking, the underlying topic of the upsert-kafka source must
be compacted. Besides, the underlying topic must have all the data with the
same key in the same partition, otherwise, the result will be wrong.
According to my understandin
+1 for voting
Regards,
Timo
On 23.10.20 09:07, Jark Wu wrote:
Thanks Shengkai!
+1 to start voting.
Best,
Jark
On Fri, 23 Oct 2020 at 15:02, Shengkai Fang wrote:
Add one more message, I have already updated the FLIP[1].
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+In
Thanks Shengkai!
+1 to start voting.
Best,
Jark
On Fri, 23 Oct 2020 at 15:02, Shengkai Fang wrote:
> Add one more message, I have already updated the FLIP[1].
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
>
> Shengkai Fang 于2020
Add one more message, I have already updated the FLIP[1].
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+upsert-kafka+Connector
Shengkai Fang 于2020年10月23日周五 下午2:55写道:
> Hi, all.
> It seems we have reached a consensus on the FLIP. If no one has other
> objections
Hi, all.
It seems we have reached a consensus on the FLIP. If no one has other
objections, I would like to start the vote for FLIP-149.
Best,
Shengkai
Jingsong Li 于2020年10月23日周五 下午2:25写道:
> Thanks for explanation,
>
> I am OK for `upsert`. Yes, Its concept has been accepted by many systems.
>
>
Thanks for explanation,
I am OK for `upsert`. Yes, Its concept has been accepted by many systems.
Best,
Jingsong
On Fri, Oct 23, 2020 at 12:38 PM Jark Wu wrote:
> Hi Timo,
>
> I have some concerns about `kafka-cdc`,
> 1) cdc is an abbreviation of Change Data Capture which is commonly used for
Hi Timo,
I have some concerns about `kafka-cdc`,
1) cdc is an abbreviation of Change Data Capture which is commonly used for
databases, not for message queues.
2) usually, cdc produces full content of changelog, including
UPDATE_BEFORE, however "upsert kafka" doesn't
3) `kafka-cdc` sounds like a n
The `kafka-cdc` looks good to me.
We can even give options to indicate whether to turn on compact, because
compact is just an optimization?
- ktable let me think about KSQL.
- kafka-compacted it is not just compacted, more than that, it still has
the ability of CDC
- upsert-kafka , upsert is back,
Hi Jark,
I would be fine with `connector=upsert-kafka`. Another idea would be to
align the name to other available Flink connectors [1]:
`connector=kafka-cdc`.
Regards,
Timo
[1] https://github.com/ververica/flink-cdc-connectors
On 22.10.20 17:17, Jark Wu wrote:
Another name is "connector=u
Another name is "connector=upsert-kafka', I think this can solve Timo's
concern on the "compacted" word.
Materialize also uses "ENVELOPE UPSERT" [1] keyword to identify such kafka
sources.
I think "upsert" is a well-known terminology widely used in many systems
and matches the
behavior of how we
Good validation messages can't solve the broken user experience, especially
that
such update mode option will implicitly make half of current kafka options
invalid or doesn't
make sense.
Best,
Kurt
On Thu, Oct 22, 2020 at 10:31 PM Jark Wu wrote:
> Hi Timo, Seth,
>
> The default value "insertin
Hi Timo, Seth,
The default value "inserting" of "mode" might be not suitable,
because "debezium-json" emits changelog messages which include updates.
On Thu, 22 Oct 2020 at 22:10, Seth Wiesman wrote:
> +1 for supporting upsert results into Kafka.
>
> I have no comments on the implementation det
+1 for supporting upsert results into Kafka.
I have no comments on the implementation details.
As far as configuration goes, I tend to favor Timo's option where we add a
"mode" property to the existing Kafka table with default value "inserting".
If the mode is set to "updating" then the validatio
Hi Jark,
"calling it "kafka-compacted" can even remind users to enable log
compaction"
But sometimes users like to store a lineage of changes in their topics.
Indepent of any ktable/kstream interpretation.
I let the majority decide on this topic to not further block this
effort. But we mig
Hi Timo,
Thanks for your opinions.
1) Implementation
We will have an stateful operator to generate INSERT and UPDATE_BEFORE.
This operator is keyby-ed (primary key as the shuffle key) after the source
operator.
The implementation of this operator is very similar to the existing
`DeduplicateKeepLa
Hi Shengkai, Hi Jark,
thanks for this great proposal. It is time to finally connect the
changelog processor with a compacted Kafka topic.
"The operator will produce INSERT rows, or additionally generate
UPDATE_BEFORE rows for the previous image, or produce DELETE rows with
all columns filled
Hi,
IMO, if we are going to mix them in one connector,
1) either users need to set some options to a specific value explicitly,
e.g. "scan.startup.mode=earliest", "sink.partitioner=hash", etc..
This makes the connector awkward to use. Users may face to fix options one
by one according to the excep
Hi Kurt, Hi Shengkai,
thanks for answering my questions and the additional clarifications. I
don't have a strong opinion on whether to extend the "kafka" connector or
to introduce a new connector. So, from my perspective feel free to go with
a separate connector. If we do introduce a new connector
Hi all,
I want to describe the discussion process which drove us to have such
conclusion, this might make some of
the design choices easier to understand and keep everyone on the same page.
Back to the motivation, what functionality do we want to provide in the
first place? We got a lot of feedba
Hi devs,
As many people are still confused about the difference option behaviours
between the Kafka connector and KTable connector, Jark and I list the
differences in the doc[1].
Best,
Shengkai
[1]
https://docs.google.com/document/d/13oAWAwQez0lZLsyfV21BfTEze1fc2cz4AZKiNOyBNPk/edit
Shengkai Fan
Hi Konstantin,
Thanks for your reply.
> It uses the "kafka" connector and does not specify a primary key.
The dimensional table `users` is a ktable connector and we can specify the
pk on the KTable.
> Will it possible to use a "ktable" as a dimensional table in FLIP-132
Yes. We can specify the w
Hi Shengkai,
Thank you for driving this effort. I believe this a very important feature
for many users who use Kafka and Flink SQL together. A few questions and
thoughts:
* Is your example "Use KTable as a reference/dimension table" correct? It
uses the "kafka" connector and does not specify a pr
Hi Danny,
First of all, we didn't introduce any concepts from KSQL (e.g. Stream vs
Table notion).
This new connector will produce a changelog stream, so it's still a dynamic
table and doesn't conflict with Flink core concepts.
The "ktable" is just a connector name, we can also call it
"compacted-
Hi Jingsong,
As the FLIP describes, "KTable connector produces a changelog stream, where
each data record represents an update or delete event.".
Therefore, a ktable source is an unbounded stream source. Selecting a
ktable source is similar to selecting a kafka source with debezium-json
format
tha
The concept seems conflicts with the Flink abstraction “dynamic table”, in
Flink we see both “stream” and “table” as a dynamic table,
I think we should make clear first how to express stream and table specific
features on one “dynamic table”,
it is more natural for KSQL because KSQL takes stream
Thanks Shengkai for your proposal.
+1 for this feature.
> Future Work: Support bounded KTable source
I don't think it should be a future work, I think it is one of the
important concepts of this FLIP. We need to understand it now.
Intuitively, a ktable in my opinion is a bounded table rather th
Hi, devs.
Jark and I want to start a new FLIP to introduce the KTable connector. The
KTable is a shortcut of "Kafka Table", it also has the same semantics with
the KTable notion in Kafka Stream.
FLIP-149:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-149%3A+Introduce+the+KTable+Connector
29 matches
Mail list logo