Re: Transform RDD[List]

Soumya Simanta Mon, 11 Aug 2014 22:03:13 -0700

Try something like this.


scala> val a = sc.parallelize(List(1,2,3,4,5))

a: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize
at <console>:12


scala> val b = sc.parallelize(List(6,7,8,9,10))

b: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[1] at parallelize
at <console>:12


scala> val x = a zip b

x: org.apache.spark.rdd.RDD[(Int, Int)] = ZippedRDD[3] at zip at
<console>:16


scala> val f = x.map( x => List(x._1,x._2) )

f: org.apache.spark.rdd.RDD[List[Int]] = MappedRDD[5] at map at <console>:18


scala> f.foreach(println)

List(2, 7)

List(1, 6)

List(5, 10)

List(3, 8)

List(4, 9)


On Tue, Aug 12, 2014 at 12:42 AM, Kevin Jung <itsjb.j...@samsung.com> wrote:

> Hi
> It may be simple question, but I can not figure out the most efficient way.
> There is a RDD containing list.
>
> RDD
> (
>  List(1,2,3,4,5)
>  List(6,7,8,9,10)
> )
>
> I want to transform this to
>
> RDD
> (
> List(1,6)
> List(2,7)
> List(3,8)
> List(4,9)
> List(5,10)
> )
>
> And I want to achieve this without using collect method because realworld
> RDD can have a lot of elements then it may cause out of memory.
> Any ideas will be welcome.
>
> Best regards
> Kevin
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Transform-RDD-List-tp11948.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: Transform RDD[List]

Reply via email to