Hi , I have the following rdd :
val conf = new SparkConf() .setAppName("<< Testing Sorting >>") val sc = new SparkContext(conf) val L = List( (new Student("XiaoTao", 80, 29), "I'm Xiaotao"), (new Student("CCC", 100, 24), "I'm CCC"), (new Student("Jack", 90, 25), "I'm Jack"), (new Student("Tom", 60, 35), "I'm Tom"), (new Student("Lucy", 78, 22), "I'm Lucy")) val rdd = sc.parallelize(L, 3) where Student is a class defined as follows: class Student(val name:String, val score:Int, val age:Int) { override def toString = "name:" + name + ", score:" + score + ", age:" + age } I want to sort the *rdd *by key, but when I wrote rdd.sortByKey it complained that "No implicit Ordering defined", which means I must extend the class with *Ordered *and provide a method named *compare*. The problem is that the class Student is from a third-party library so I cannot change its definition. I'd like to know if there is a sorting method that I can provide it a customized compare function so that it can sort the rdd according to the sorting function I provide. One more question, if I want to sort RDD[(k, v)] by value , do I have to map that rdd so that its key and value exchange their positions in the tuple? Are there any functions that allow us to sort rdd by things other than key ? Thanks