Hey Yaroslav, Thanks for your response! Got it, so the need for UPDATE_BEFOREs will depend on your sinks. I just watched the talk and it makes sense when you think of the UPDATE_BEFOREs as retractions.
In the talk, Timo discusses how removing the need for UPDATE_BEFORE is an optimization of sorts, if your use-case allows for it, since it'd enable removing a bunch of messages that processed by Flink. I'm wondering about the converse, are there any situations where having UPDATE_BEFORE's will result in improved performance? Does the planner take advantage of them in some situations? I don't have a specific example in mind but just trying to understand the full implications of missing UPDATE_BEFORE messages. On Wed, Feb 7, 2024 at 4:24 PM Yaroslav Tkachenko <yaros...@goldsky.com.invalid> wrote: > Hey Kevin, > > In my experience it mostly depends on the type of your sinks. If all of > your sinks can leverage primary keys and support upsert semantics, you > don't really need UPDATE_BEFOREs altogether (you can even filter them out). > But if you have sinks with append-only semantics (OR if you don't have > primary keys defined) you need UPDATE_BEFOREs to correctly support > retractions (in case of updates and deletes). > > Great talk on this topic: > https://www.youtube.com/watch?v=iRlLaY-P6iE&ab_channel=PlainSchwarz (the > middle part is the most relevant). > > > On Wed, Feb 7, 2024 at 12:13 PM Kevin Lam <kevin....@shopify.com.invalid> > wrote: > > > Hi there! > > > > I have a question about Changelog Stream Processing with Flink SQL and > the > > Flink Table API. I would like to better understand how UPDATE_BEFORE > fields > > are used by Flink. > > > > Our team uses Debezium to extract Change Data Capture events from MySQL > > databases. We currently redact the `before` fields in the envelope [0] so > > that redacted PII doesn't sit in our Kafka topics in the `before` field > of > > UPDATE events. > > > > As a result if we were to consume these CDC streams with Flink, there > would > > be missing UPDATE_BEFORE fields for UPDATE events. What kind of impact > > would this have on performance and correctness, if any? Any other > > considerations we should be aware of? > > > > Thanks in advance for your help! > > > > > > [0] > > https://debezium.io/documentation/reference/stable/connectors/mysql.html > > >