Hey Kannappan,

First of all, what is the reason for avoiding groupByKey since this is
exactly what it is for? If you must use reduceByKey with a one-liner, then
take a look at this:

lambda a,b: (a if type(a) == list else [a]) + (b if type(b) == list else
[b])

In contrast to groupByKey, this won't return 'Yorkshire' as a one element
list but as a plain string (i.e. in the same way as in your output example).

Hope this helps!
-Sven

On Thu, Jun 25, 2015 at 3:37 PM, Kannappan Sirchabesan <buildka...@gmail.com
> wrote:

> Hi,
>   I am trying to see what is the best way to reduce the values of a RDD of
> (key,value) pairs into (key,ListOfValues) pair. I know various ways of
> achieving this, but I am looking for a efficient, elegant one-liner if
> there is one.
>
> Example:
> Input RDD: (USA, California), (UK, Yorkshire), (USA, Colorado)
> Output RDD: (USA, [California, Colorado]), (UK, Yorkshire)
>
> Is it possible to use reduceByKey or foldByKey to achieve this, instead of
> groupBykey.
>
> Something equivalent to a cons operator from LISP?, so that I could just
> say reduceBykey(lambda x,y:  (cons x y) ). May be it is more a python
> question than a spark question of how to create a list from 2 elements
> without a starting empty list?
>
> Thanks,
> Kannappan
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
www.skrasser.com <http://www.skrasser.com/?utm_source=sig>

Reply via email to