Re: Spark Streaming for time consuming job

2014-10-02 Thread Eko Susilo
Hi Mayur, Thanks for your suggestion. In fact, that's i'm thinking about; to pass those data, and return only the percentage of the outlier in a particular window. I also have some doubt if i would implement the outlier detection on rdd as you have suggested. >From what i understand that those

Re: Spark Streaming for time consuming job

2014-10-01 Thread Mayur Rustagi
Calling collect on anything is almost always a bad idea. The only exception is if you are looking to pass that data on to any other system & never see it again :) . I would say you need to implement outlier detection on the rdd & process it in spark itself rather than calling collect on it. Regar

Spark Streaming for time consuming job

2014-09-30 Thread Eko Susilo
Hi All, I have a problem that i would like to consult about spark streaming. I have a spark streaming application that parse a file (which will be growing as time passed by)This file contains several columns containing lines of numbers, these parsing is divided into windows (each 1 minute). Each