Hey Kevin, In my experience it mostly depends on the type of your sinks. If all of your sinks can leverage primary keys and support upsert semantics, you don't really need UPDATE_BEFOREs altogether (you can even filter them out). But if you have sinks with append-only semantics (OR if you don't have primary keys defined) you need UPDATE_BEFOREs to correctly support retractions (in case of updates and deletes).
Great talk on this topic: https://www.youtube.com/watch?v=iRlLaY-P6iE&ab_channel=PlainSchwarz (the middle part is the most relevant). On Wed, Feb 7, 2024 at 12:13 PM Kevin Lam <kevin....@shopify.com.invalid> wrote: > Hi there! > > I have a question about Changelog Stream Processing with Flink SQL and the > Flink Table API. I would like to better understand how UPDATE_BEFORE fields > are used by Flink. > > Our team uses Debezium to extract Change Data Capture events from MySQL > databases. We currently redact the `before` fields in the envelope [0] so > that redacted PII doesn't sit in our Kafka topics in the `before` field of > UPDATE events. > > As a result if we were to consume these CDC streams with Flink, there would > be missing UPDATE_BEFORE fields for UPDATE events. What kind of impact > would this have on performance and correctness, if any? Any other > considerations we should be aware of? > > Thanks in advance for your help! > > > [0] > https://debezium.io/documentation/reference/stable/connectors/mysql.html >