Hi, vtygoss
> I'm working on migrating from full-data-pipeline(with spark) to
> incremental-data-pipeline(with flink cdc), and i met a problem about accuracy
> validation between pipeline based flink and spark.
Glad to hear that !
> For bounded data, it's simple to validate the two result se
Hi, all.
>From my understanding, the accuracy for the sync pipeline requires to
snapshot the source and sink at some points. It is just like we have a
checkpoint that contains all the data at some time for both sink and
source. Then we can compare the content in the checkpoint and find the
differ
It's a good question. Let me ping @Leonard to share more thoughts.
Best,
Shengkai
vtygoss 于2022年5月20日周五 16:04写道:
> Hi community!
>
>
> I'm working on migrating from full-data-pipeline(with spark) to
> incremental-data-pipeline(with flink cdc), and i met a problem about
> accuracy validation bet
Hi community!
I'm working on migrating from full-data-pipeline(with spark) to
incremental-data-pipeline(with flink cdc), and i met a problem about accuracy
validation between pipeline based flink and spark.
For bounded data, it's simple to validate the two result sets are consitent or
not.