Thank you for your response.

The algorithm that I am proposing is Isolation Forest.
Link to paper: paper
<https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf>. I
particularly find that it should be included in Spark ML because so many
applications that use Spark as part of real time streaming engine in
industry need anomaly detection and current Spark ML supports it in some
way by means clustering. I will probably start to create the implementation
and prepare for proposal as you suggested.

It is interesting to know that Spark is still implementing stuff in Spark
ML to reach full parity with MLlib. Can I please get connected to folks
working on it as I am interested in contributing. I have been heavy user of
Spark since summer'15.

 Cheers!
-Venali

On Thu, Sep 21, 2017 at 1:33 AM, Seth Hendrickson <
seth.hendrickso...@gmail.com> wrote:

> I'm not exactly clear on what you're proposing, but this sounds like
> something that would live as a Spark package - a framework for anomaly
> detection built on Spark. If there is some specific algorithm you have in
> mind, it would be good to propose it on JIRA and discuss why you think it
> needs to be included in Spark and not live as a Spark package.
>
> In general, there will probably be resistance to including new algorithms
> in Spark ML, especially until the ML package has reached full parity with
> MLlib. Still, if you can provide more details that will help to understand
> what is best here.
>
> On Thu, Sep 14, 2017 at 1:29 AM, Venali Sonone <venalis...@gmail.com>
> wrote:
>
>>
>> Hello,
>>
>> I am new to dev community of Spark and also open source in general but
>> have used Spark extensively.
>> I want to create a complete part on anomaly detection in spark Mlib,
>> For the same I want to know if someone could guide me so i can start the
>> development and contribute to Spark Mlib.
>>
>> Sorry for sounding naive if i do but any help is appreciated.
>>
>> Cheers!
>> -venna
>>
>>
>

Reply via email to