hi. what is the best way to pass through a large dataset in small, sequential mini batches?
for example, with 1,000,000 data points and the mini batch size is 10, we would need to do some computation at these mini batches (0..9), (10..19), (20..29), ... (N-9, N) RDD.repartition(N/10).mapPartitions() work? thanks! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/what-is-the-best-way-to-implement-mini-batches-tp20264.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org