Try something like this.
scala> val a = sc.parallelize(List(1,2,3,4,5)) a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:12 scala> val b = sc.parallelize(List(6,7,8,9,10)) b: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize at <console>:12 scala> val x = a zip b x: org.apache.spark.rdd.RDD[(Int, Int)] = ZippedRDD[3] at zip at <console>:16 scala> val f = x.map( x => List(x._1,x._2) ) f: org.apache.spark.rdd.RDD[List[Int]] = MappedRDD[5] at map at <console>:18 scala> f.foreach(println) List(2, 7) List(1, 6) List(5, 10) List(3, 8) List(4, 9) On Tue, Aug 12, 2014 at 12:42 AM, Kevin Jung <itsjb.j...@samsung.com> wrote: > Hi > It may be simple question, but I can not figure out the most efficient way. > There is a RDD containing list. > > RDD > ( > List(1,2,3,4,5) > List(6,7,8,9,10) > ) > > I want to transform this to > > RDD > ( > List(1,6) > List(2,7) > List(3,8) > List(4,9) > List(5,10) > ) > > And I want to achieve this without using collect method because realworld > RDD can have a lot of elements then it may cause out of memory. > Any ideas will be welcome. > > Best regards > Kevin > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Transform-RDD-List-tp11948.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >