Re: Processing order in Spark

2014-10-13 Thread Tobias Pfeiffer
Sean, thanks, I didn't know about repartitionAndSortWithinPartitions, that seems very helpful! Tobias

Re: Processing order in Spark

2014-10-09 Thread Sean Owen
Since an RDD doesn't have any ordering guarantee to begin with, I don't think there is any guarantee about the order in which data is encountered. It can change when the same RDD is reevaluated even. As you say, your scenario 1 is about the best you can do. You can achieve this if you can define s

Re: Processing order in Spark

2014-10-09 Thread x
I doubt Spark has such a ability which can arrange the order of task execution. You could try from these aspects. 1. Write your partitioner to group your data. 2. Sort elements in partitions e.g. with row index. 3. Control the order of incoming outcome obtained from Spark at your application. xj

Processing order in Spark

2014-10-09 Thread Tobias Pfeiffer
Hi, I am planning an application where the order of items is somehow important. In particular it is an online machine learning application where learning in a different order will lead to a different model. I was wondering about ordering guarantees for Spark applications. So if I say myRdd.map(so