Sean,
thanks, I didn't know about repartitionAndSortWithinPartitions, that seems
very helpful!
Tobias
Since an RDD doesn't have any ordering guarantee to begin with, I
don't think there is any guarantee about the order in which data is
encountered. It can change when the same RDD is reevaluated even.
As you say, your scenario 1 is about the best you can do. You can
achieve this if you can define s
I doubt Spark has such a ability which can arrange the order of task
execution.
You could try from these aspects.
1. Write your partitioner to group your data.
2. Sort elements in partitions e.g. with row index.
3. Control the order of incoming outcome obtained from Spark at your
application.
xj
Hi,
I am planning an application where the order of items is somehow important.
In particular it is an online machine learning application where learning
in a different order will lead to a different model.
I was wondering about ordering guarantees for Spark applications. So if I
say myRdd.map(so