Vigo, I mean that the algorithm is a standalone piece of code. There are no examples that I am aware of for running it using Flink.
Ryan From: Salvador Vigo <salvador...@gmail.com> Sent: Saturday, April 4, 2020 12:26 AM To: Marta Paes Moreira <ma...@ververica.com> Cc: Nienhuis, Ryan <nienh...@amazon.com>; user <user@flink.apache.org> Subject: RE: [EXTERNAL] Anomaly detection Apache Flink CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Thanks for answer. @Marta, First answer videos [1], [2]. It was interesting to see this two different approaches, although I was looking for some more specific implementation. Link number [3], I didn't know the existence of Kinesis, so maybe could be good for benchmarking and comparing my results with the Kinesis results. Then the approach of CEP, I am very related with this topic since my current work is based in the implementation of a CEP pipeline for monitoring. The only problem I see here is that you need in advance a predefined pattern. But it worth a try. @Ryan, I see this idea of the random cut forest algorithm more close to the idea I am looking for. What do you mean when you say that doesn't work getting it works with Flink? Best, On Fri, Apr 3, 2020 at 8:47 PM Marta Paes Moreira <ma...@ververica.com<mailto:ma...@ververica.com>> wrote: Forgot to mention that you might also want to have a look into Flink CEP [1], Flink's library for Complex Event Processing. It allows you to define and detect event patterns over streams, which can come in pretty handy for anomaly detection. [1] https://ci.apache.org/projects/flink/flink-docs-stable/dev/libs/cep.html On Fri, Apr 3, 2020 at 6:08 PM Nienhuis, Ryan <nienh...@amazon.com<mailto:nienh...@amazon.com>> wrote: I would also have a look at the random cut forest algorithm. This is the base algorithm that is used for anomaly detection in several AWS services (Quicksight, Kinesis Data Analytics, etc.). It doesn’t help with getting it working with Flink, but may be a good place to start for an algorithm. https://github.com/aws/random-cut-forest-by-aws Ryan From: Marta Paes Moreira <ma...@ververica.com<mailto:ma...@ververica.com>> Sent: Friday, April 3, 2020 5:25 AM To: Salvador Vigo <salvador...@gmail.com<mailto:salvador...@gmail.com>> Cc: user <user@flink.apache.org<mailto:user@flink.apache.org>> Subject: RE: [EXTERNAL] Anomaly detection Apache Flink CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi, Salvador. You can find some more examples of real-time anomaly detection with Flink in these presentations from Microsoft [1] and Salesforce [2] at Flink Forward. This blogpost [3] also describes how to build that kind of application using Kinesis Data Analytics (based on Flink). Let me know if these resources help! [1] https://www.youtube.com/watch?v=NhOZ9Q9_wwI [2] https://www.youtube.com/watch?v=D4kk1JM8Kcg [3] https://towardsdatascience.com/real-time-anomaly-detection-with-aws-c237db9eaa3f On Fri, Apr 3, 2020 at 11:37 AM Salvador Vigo <salvador...@gmail.com<mailto:salvador...@gmail.com>> wrote: Hi there, I am working in an approach to make some experiments related with anomaly detection in real time with Apache Flink. I would like to know if there are already some open issues in the community. The only example I found was the one of Scott Kidder<https://mux.com/team/scott-kidder> and the Mux platform, 2017. If any one is already working in this topic or know some related work or publication I will be grateful. Best,