I think for an unbounded data, we can only check the result at one point of 
time, that is the work what Watermark[1] does. What about tag one time and to 
validate the data accuracy at that moment?

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/dev/table/sql/create/#watermark



在 2022-05-20 16:02:39,"vtygoss" <vtyg...@126.com> 写道:

Hi community!




I'm working on migrating from full-data-pipeline(with spark) to 
incremental-data-pipeline(with flink cdc), and i met a problem about accuracy 
validation between pipeline based flink and spark.




For bounded data, it's simple to validate the two result sets are consitent or 
not. 

But, for unbouned data and event-driven application, how to make sure the data 
stream produced is correct, especially when there are some retract functions 
with high impactions, e.g. row_number. 




Is there any document for this preblom?  Thanks for your any suggestions or 
replies. 




Best Regards! 

Reply via email to