Hey Kannappan, First of all, what is the reason for avoiding groupByKey since this is exactly what it is for? If you must use reduceByKey with a one-liner, then take a look at this:
lambda a,b: (a if type(a) == list else [a]) + (b if type(b) == list else [b]) In contrast to groupByKey, this won't return 'Yorkshire' as a one element list but as a plain string (i.e. in the same way as in your output example). Hope this helps! -Sven On Thu, Jun 25, 2015 at 3:37 PM, Kannappan Sirchabesan <buildka...@gmail.com > wrote: > Hi, > I am trying to see what is the best way to reduce the values of a RDD of > (key,value) pairs into (key,ListOfValues) pair. I know various ways of > achieving this, but I am looking for a efficient, elegant one-liner if > there is one. > > Example: > Input RDD: (USA, California), (UK, Yorkshire), (USA, Colorado) > Output RDD: (USA, [California, Colorado]), (UK, Yorkshire) > > Is it possible to use reduceByKey or foldByKey to achieve this, instead of > groupBykey. > > Something equivalent to a cons operator from LISP?, so that I could just > say reduceBykey(lambda x,y: (cons x y) ). May be it is more a python > question than a spark question of how to create a list from 2 elements > without a starting empty list? > > Thanks, > Kannappan > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- www.skrasser.com <http://www.skrasser.com/?utm_source=sig>