If you mean your values are all a Seq or similar already, then you just take the top 1 ordered by the size of the value:
rdd.top(1)(Ordering.by(_._2.size)) On Thu, Aug 28, 2014 at 9:34 AM, Deep Pradhan <pradhandeep1...@gmail.com> wrote: > Hi, > I have a RDD of key-value pairs. Now I want to find the "key" for which > the "values" has the largest number of elements. How should I do that? > Basically I want to select the key for which the number of items in values > is the largest. > Thank You >