Re: Key-Value in PairRDD

2014-08-26 Thread Sean Owen
I'd suggest first reading the scaladoc for RDD and PairRDDFunctions to familiarize yourself with all the operations available: http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunc

Key-Value in PairRDD

2014-08-26 Thread Deep Pradhan
I have the following code *val nodes = lines.map(s =>{val fields = s.split("\\s+") (fields(0),fields(1))}).distinct().groupByKey().cache()* and when I print out the nodes RDD I get the following *(4,ArrayBuffer(1))(2,ArrayBuffer(1))(3,ArrayBuffer(1))(1,ArrayBuffer(3, 2,