I'd suggest first reading the scaladoc for RDD and PairRDDFunctions to familiarize yourself with all the operations available:
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions You'll quickly find keys() and max(). On Tue, Aug 26, 2014 at 10:54 AM, Deep Pradhan <pradhandeep1...@gmail.com> wrote: > I have the following code > > val nodes = lines.map(s =>{ > val fields = s.split("\\s+") > (fields(0),fields(1)) > }).distinct().groupByKey().cache() > > and when I print out the nodes RDD I get the following > > (4,ArrayBuffer(1)) > (2,ArrayBuffer(1)) > (3,ArrayBuffer(1)) > (1,ArrayBuffer(3, 2, 4)) > > Now, I want to print only the key part of the RDD and also the maximum value > among the keys. How should I do that? > Thank You > > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org