Re: Key-Value in PairRDD

Sean Owen Tue, 26 Aug 2014 03:01:01 -0700

I'd suggest first reading the scaladoc for RDD and PairRDDFunctions to
familiarize yourself with all the operations available:


http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions

You'll quickly find keys() and max().

On Tue, Aug 26, 2014 at 10:54 AM, Deep Pradhan
<pradhandeep1...@gmail.com> wrote:
> I have the following code
>
> val nodes = lines.map(s =>{
>         val fields = s.split("\\s+")
>         (fields(0),fields(1))
>         }).distinct().groupByKey().cache()
>
> and when I print out the nodes RDD I get the following
>
> (4,ArrayBuffer(1))
> (2,ArrayBuffer(1))
> (3,ArrayBuffer(1))
> (1,ArrayBuffer(3, 2, 4))
>
> Now, I want to print only the key part of the RDD and also the maximum value
> among the keys. How should I do that?
> Thank You
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Key-Value in PairRDD

Reply via email to