Fwd: Sequential computation over several partitions

2016-06-09 Thread Jeroen Miller
Hello, On Wed, Jun 8, 2016 at 12:59 AM, Mich Talebzadeh wrote: > > one thing you may consider is using something like flume to store > data on hfs. [...] Thank you for your sensible suggestions. > Have you thought of other tools besides Spark? No, as least not seriously yet. Flume looks like a

Re: Sequential computation over several partitions

2016-06-07 Thread Mich Talebzadeh
Am I correct in understanding that you want to read and iterate all the data to be correct. For example if a user is already unsubscribed then you want to ignore all the subsequent subscribe regardless how often do you want to iterate through the full data. The frequency of your analysis? the iss

Sequential computation over several partitions

2016-06-07 Thread Jeroen Miller
Dear fellow Sparkers, I am a new Spark user and I am trying to solve a (conceptually simple) problem which may not be a good use case for Spark, at least for the RDD API. But before I turn my back on it, I would rather have the opinion of more knowledgeable developers than me, as it is highly like