For converting an Array or any List to a RDD, we can try using : >sc.parallelize(groupedScore) //or whatever the name of the list variable is
On Mon, Dec 1, 2014 at 8:14 PM, Xuefeng Wu <ben...@gmail.com> wrote: > Hi, I have a problem, it is easy in Scala code, but I can not take the top > N from RDD as RDD. > > > There are 10000 Student Score, ask take top 10 age, and then take top 10 > from each age, the result is 100 records. > > The Scala code is here, but how can I do it in RDD, *for RDD.take return > is Array, but other RDD.* > > example Scala code: > > import scala.util.Random > > case class StudentScore(age: Int, num: Int, score: Int, name: Int) > > val scores = for { > i <- 1 to 10000 > } yield { > StudentScore(Random.nextInt(100), Random.nextInt(100), Random.nextInt(), > Random.nextInt()) > } > > > def takeTop(scores: Seq[StudentScore], byKey: StudentScore => Int): Seq[(Int, > Seq[StudentScore])] = { > val groupedScore = scores.groupBy(byKey) > .map{case (_, _scores) => > (_scores.foldLeft(0)((acc, v) => acc + v.score), _scores)}.toSeq > groupedScore.sortBy(_._1).take(10) > } > > val topScores = for { > (_, ageScores) <- takeTop(scores, _.age) > (_, numScores) <- takeTop(ageScores, _.num) > } yield { > numScores > } > > topScores.size > > > -- > > ~Yours, Xuefeng Wu/吴雪峰 敬上 > >