[DISCUSS] Data quality by apache flink

tanjialiang Wed, 12 Jan 2022 02:43:33 -0800

Hi everyone,

I would like to start a discussion thread on "Flink SQL support data quality"


For example, I have a SQL job, in this job i have a source table with a column 
named phone, and i want to set the data quality of the data in the column 
phone's pattern, such as it must match the pattern of telephone, if not match, 
i can choose to drop it or ignored. Also, we can mark the quality into the 
metrics, so that user can monitor the data quality from the source and the sink.

After this, user can kown about the data quality from the source and sink, 
which is very useful for the downstream.

How to do that:
In my opinion, we can set this quality option in the table with properties, and 
add a flatmap operator after SourceOperator and before SinkOperator, the 
flatmap operator will do the quality logic like match the pattern, drop it or 
ignored, mark it into the metrics which user can monitor. 

It Is a draft and everyone has any good idea?

从 Windows 版邮件发送

[DISCUSS] Data quality by apache flink

Reply via email to