Thanks, Andrey, I will check it out.

On Mon, Jun 8, 2020 at 8:10 PM Andrey Zagrebin <azagre...@apache.org> wrote:

> Hi Anuj,
>
> I am not familiar with data quality measurement methods and deequ
> <https://github.com/awslabs/deequ> in depth.
> What you describe looks like monitoring some data metrics.
> Maybe, there are other community users aware of better solution.
> Meanwhile, I would recommend to implement the checks and failures as
> separate operators and side outputs (for streaming) [1], if not yet
> Then you could also use Flink metrics to aggregate and monitor the data
> [2].
> The metrics systems usually allow to define alerts on metrics, like in
> prometheus [3], [4].
>
> Best,
> Andrey
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/side_output.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html
> [3]
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/metrics.html#prometheus-orgapacheflinkmetricsprometheusprometheusreporter
> [4] https://prometheus.io/docs/alerting/overview/
>
> On Sat, Jun 6, 2020 at 9:23 AM aj <ajainje...@gmail.com> wrote:
>
>> Hello All,
>>
>> I  want to do some data quality analysis on stream data example.
>>
>> 1. Fill rate in a particular column
>> 2. How many events are going to error queue due to favor schema
>> validation failed?
>> 3. Different statistics measure of a column.
>> 3. Alert if a particular threshold is breached (like if fill rate is less
>> than 90% for a column)
>>
>> Is there any library that exists on top of Flink for data quality. As I
>> am looking there is a library on top of the spark
>> https://github.com/awslabs/deequ
>>
>> This proved all that I am looking for.
>>
>> --
>> Thanks & Regards,
>> Anuj Jain
>>
>>
>>
>> <http://www.cse.iitm.ac.in/%7Eanujjain/>
>>
>

-- 
Thanks & Regards,
Anuj Jain
Mob. : +91- 8588817877
Skype : anuj.jain07
<http://www.oracle.com/>


<http://www.cse.iitm.ac.in/%7Eanujjain/>

Reply via email to