Hi Venna,

Sounds like a very interesting algorithm. I have to agree with Seth, in the
end you don't want to add a lot of algorithms to Spark itself, it will blow
up the codebase and in the end the tests will run forever. You can also
consider publishing it to the Spark Packages website. I've also published
an outlier detection over there:
https://spark-packages.org/package/Fokko/spark-stochastic-outlier-selection

Cheers, Fokko

2017-09-22 2:10 GMT+02:00 Venali Sonone <venalis...@gmail.com>:

> Thank you for your response.
>
> The algorithm that I am proposing is Isolation Forest.
> Link to paper: paper
> <https://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/tkdd11.pdf>. I
> particularly find that it should be included in Spark ML because so many
> applications that use Spark as part of real time streaming engine in
> industry need anomaly detection and current Spark ML supports it in some
> way by means clustering. I will probably start to create the implementation
> and prepare for proposal as you suggested.
>
> It is interesting to know that Spark is still implementing stuff in Spark
> ML to reach full parity with MLlib. Can I please get connected to folks
> working on it as I am interested in contributing. I have been heavy user of
> Spark since summer'15.
>
>  Cheers!
> -Venali
>
> On Thu, Sep 21, 2017 at 1:33 AM, Seth Hendrickson <
> seth.hendrickso...@gmail.com> wrote:
>
>> I'm not exactly clear on what you're proposing, but this sounds like
>> something that would live as a Spark package - a framework for anomaly
>> detection built on Spark. If there is some specific algorithm you have in
>> mind, it would be good to propose it on JIRA and discuss why you think it
>> needs to be included in Spark and not live as a Spark package.
>>
>> In general, there will probably be resistance to including new algorithms
>> in Spark ML, especially until the ML package has reached full parity with
>> MLlib. Still, if you can provide more details that will help to understand
>> what is best here.
>>
>> On Thu, Sep 14, 2017 at 1:29 AM, Venali Sonone <venalis...@gmail.com>
>> wrote:
>>
>>>
>>> Hello,
>>>
>>> I am new to dev community of Spark and also open source in general but
>>> have used Spark extensively.
>>> I want to create a complete part on anomaly detection in spark Mlib,
>>> For the same I want to know if someone could guide me so i can start the
>>> development and contribute to Spark Mlib.
>>>
>>> Sorry for sounding naive if i do but any help is appreciated.
>>>
>>> Cheers!
>>> -venna
>>>
>>>
>>
>

Reply via email to