Apologies, forgot to finish. If the Kafka source performs its own retractions of old data on key (user_id) for every append it receives, it should resolve this discrepancy I think.
Again, is this true? Anything else I'm missing? Thanks! On Tue, Feb 23, 2021 at 6:12 PM Rex Fenley <r...@remind101.com> wrote: > Hi, > > I'm concerned about the impacts of Kafka's compactions when sending data > between running flink jobs. > > For example, one job produces retract stream records in sequence of > (false, (user_id: 1, state: "california") -- retract > (true, (user_id: 1, state: "ohio")) -- append > Which is consumed by Kafka and keyed by user_id, this could end up > compacting to just > (true, (user_id: 1, state: "ohio")) -- append > If some other downstream Flink job has a filter on state == "california" > and reads from the Kafka stream, I assume it will miss the retract message > altogether and produce incorrect results. > > Is this true? How do we prevent this from happening? We need to use > compaction since all our jobs are based on CDC and we can't just drop data > after x number of days. > > Thanks > > -- > > Rex Fenley | Software Engineer - Mobile and Backend > > > Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | > FOLLOW US <https://twitter.com/remindhq> | LIKE US > <https://www.facebook.com/remindhq> > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>