Hi ,

    I have the following rdd :

  val conf = new SparkConf()
          .setAppName("<< Testing Sorting >>")
  val sc = new SparkContext(conf)

  val L = List(
      (new Student("XiaoTao", 80, 29), "I'm Xiaotao"),
      (new Student("CCC", 100, 24), "I'm CCC"),
      (new Student("Jack", 90, 25), "I'm Jack"),
      (new Student("Tom", 60, 35), "I'm Tom"),
      (new Student("Lucy", 78, 22), "I'm Lucy"))

  val rdd = sc.parallelize(L, 3)


where Student is a class defined as follows:

class Student(val name:String, val score:Int, val age:Int)  {

 override def toString =
                 "name:" + name + ", score:" + score + ", age:" + age

}



I want to sort the *rdd *by key, but when I wrote rdd.sortByKey it
complained that "No implicit Ordering defined", which means I must extend
the class with *Ordered *and provide a method named  *compare*.  The
problem is that the class Student is from a third-party library so I cannot
change its definition. I'd like to know if there is a sorting method that I
can provide it a customized compare function so that it can sort the rdd
according to the sorting function I provide.

One more question, if I want to sort RDD[(k, v)] by value , do I have to
map that rdd so that its key and value exchange their positions in the
tuple? Are there any functions that allow us to sort rdd by things other
than key ?

Thanks

Reply via email to