Apologies, forgot to finish. If the Kafka source performs its own
retractions of old data on key (user_id) for every append it receives, it
should resolve this discrepancy I think.

Again, is this true? Anything else I'm missing?

Thanks!


On Tue, Feb 23, 2021 at 6:12 PM Rex Fenley <r...@remind101.com> wrote:

> Hi,
>
> I'm concerned about the impacts of Kafka's compactions when sending data
> between running flink jobs.
>
> For example, one job produces retract stream records in sequence of
> (false, (user_id: 1, state: "california") -- retract
> (true, (user_id: 1, state: "ohio")) -- append
> Which is consumed by Kafka and keyed by user_id, this could end up
> compacting to just
> (true, (user_id: 1, state: "ohio")) -- append
> If some other downstream Flink job has a filter on state == "california"
> and reads from the Kafka stream, I assume it will miss the retract message
> altogether and produce incorrect results.
>
> Is this true? How do we prevent this from happening? We need to use
> compaction since all our jobs are based on CDC and we can't just drop data
> after x number of days.
>
> Thanks
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to