Re: Spark Streaming for time consuming job

2014-10-02 Thread Eko Susilo
Hi Mayur, Thanks for your suggestion. In fact, that's i'm thinking about; to pass those data, and return only the percentage of the outlier in a particular window. I also have some doubt if i would implement the outlier detection on rdd as you have suggested. >From what i understand that those

Re: Spark Streaming for time consuming job

2014-10-01 Thread Mayur Rustagi
Calling collect on anything is almost always a bad idea. The only exception is if you are looking to pass that data on to any other system & never see it again :) . I would say you need to implement outlier detection on the rdd & process it in spark itself rather than calling collect on it. Regar