Re: what is the best way to implement mini batches?

Matei Zaharia Thu, 11 Dec 2014 11:37:31 -0800

You can just do mapPartitions on the whole RDD, and then called sliding() on 
the iterator in each one to get a sliding window. One problem is that you will 
not be able to slide "forward" into the next partition at partition boundaries. 
If this matters to you, you need to do something more complicated to get those, 
such as the repartition that you said (where you map each record to the 
partition it should be in).


Matei

> On Dec 11, 2014, at 10:16 AM, ll <duy.huynh....@gmail.com> wrote:
> 
> any advice/comment on this would be much appreciated.  
> 
> 
> 
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264p20635.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: what is the best way to implement mini batches?

Reply via email to