Hi Mayur,
Thanks for your suggestion.
In fact, that's i'm thinking about; to pass those data, and return only the
percentage of the outlier in a particular window.
I also have some doubt if i would implement the outlier detection on rdd as
you have suggested.
>From what i understand that those
Calling collect on anything is almost always a bad idea. The only
exception is if you are looking to pass that data on to any other system &
never see it again :) .
I would say you need to implement outlier detection on the rdd & process it
in spark itself rather than calling collect on it.
Regar