I find monoids pretty useful in this respect, basically separating out the
logic in a monoid and then applying the logic to either a stream or a
batch. A list of such practices could be really useful.

On Thu, Feb 19, 2015 at 12:26 AM, Jean-Pascal Billaud <j...@tellapart.com>
wrote:

> Hey,
>
> It seems pretty clear that one of the strength of Spark is to be able to
> share your code between your batch and streaming layer. Though, given that
> Spark streaming uses DStream being a set of RDDs and Spark uses a single
> RDD there might some complexity associated with it.
>
> Of course since DStream is a superset of RDDs, one can just run the same
> code at the RDD granularity using DStream::forEachRDD. While this should
> work for map, I am not sure how that can work when it comes to reduce phase
> given that a group of keys spans across multiple RDDs.
>
> One of the option is to change the dataset object on which a job works on.
> For example of passing an RDD to a class method, one passes a higher level
> object (MetaRDD) that wraps around RDD or DStream depending the context. At
> this point the job calls its regular maps, reduces and so on and the
> MetaRDD wrapper would delegate accordingly.
>
> Just would like to know the official best practice from the spark
> community though.
>
> Thanks,
>



-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com

Reply via email to