Hi,
I am trying to see what is the best way to reduce the values of a RDD of
(key,value) pairs into (key,ListOfValues) pair. I know various ways of
achieving this, but I am looking for a efficient, elegant one-liner if there is
one.
Example:
Input RDD: (USA, California), (UK, Yorkshire), (US
Hi,
If you are new to all three languages, go with Scala or Python. Python is
easier but check out Scala and see if it is easy enough for you. With the
launch of data frames, it might not even matter which language you choose
performance-wise.
Thanks,
Kannappan
> On Jun 25, 2015, at 10:02 PM
n contrast to groupByKey, this won't return 'Yorkshire' as a one element
> list but as a plain string (i.e. in the same way as in your output example).
>
> Hope this helps!
> -Sven
>
> On Thu, Jun 25, 2015 at 3:37 PM, Kannappan Sirchabesan <mailto:buildka...@gmail.
iner, mergeValue, mergeCombiners)
>
> Best,
> -Sven
>
> On Thu, Jun 25, 2015 at 4:34 PM, Kannappan Sirchabesan <mailto:buildka...@gmail.com>> wrote:
> Thanks. This should work fine.
>
> I am trying to avoid groupByKey for performance reasons as the input is a
>