I think it could be done like: 1. using mapPartition to randomly drop some partition 2. drop some elements randomly(for selected partition) 3. calculate gradient step for selected elements
I don't think fixed step is needed, but fixed step could be done: 1. zipWithIndex 2. create ShuffleRDD based on the index(eg. using index/10 as key) 3. using mapPartition to calculate each bach I also have a question: Can mini batches run in parallel? I think parallel all batches just like a full batch GD in some case. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264p20677.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org