reduceByKey - add values to a list

2015-06-25 Thread Kannappan Sirchabesan
Hi, I am trying to see what is the best way to reduce the values of a RDD of (key,value) pairs into (key,ListOfValues) pair. I know various ways of achieving this, but I am looking for a efficient, elegant one-liner if there is one. Example: Input RDD: (USA, California), (UK, Yorkshire), (US

Re: Scala/Python or Java

2015-06-25 Thread Kannappan Sirchabesan
Hi, If you are new to all three languages, go with Scala or Python. Python is easier but check out Scala and see if it is easy enough for you. With the launch of data frames, it might not even matter which language you choose performance-wise. Thanks, Kannappan > On Jun 25, 2015, at 10:02 PM

Re: reduceByKey - add values to a list

2015-06-25 Thread Kannappan Sirchabesan
n contrast to groupByKey, this won't return 'Yorkshire' as a one element > list but as a plain string (i.e. in the same way as in your output example). > > Hope this helps! > -Sven > > On Thu, Jun 25, 2015 at 3:37 PM, Kannappan Sirchabesan <mailto:buildka...@gmail.

Re: reduceByKey - add values to a list

2015-06-25 Thread Kannappan Sirchabesan
iner, mergeValue, mergeCombiners) > > Best, > -Sven > > On Thu, Jun 25, 2015 at 4:34 PM, Kannappan Sirchabesan <mailto:buildka...@gmail.com>> wrote: > Thanks. This should work fine. > > I am trying to avoid groupByKey for performance reasons as the input is a >