Re: Processing order in Spark

2014-10-13 Thread Tobias Pfeiffer
Sean, thanks, I didn't know about repartitionAndSortWithinPartitions, that seems very helpful! Tobias

Re: Processing order in Spark

2014-10-09 Thread Sean Owen
Since an RDD doesn't have any ordering guarantee to begin with, I don't think there is any guarantee about the order in which data is encountered. It can change when the same RDD is reevaluated even. As you say, your scenario 1 is about the best you can do. You can achieve this if you can define s

Re: Processing order in Spark

2014-10-09 Thread x
I doubt Spark has such a ability which can arrange the order of task execution. You could try from these aspects. 1. Write your partitioner to group your data. 2. Sort elements in partitions e.g. with row index. 3. Control the order of incoming outcome obtained from Spark at your application. xj