You are suggesting that the String concatenation is slow? It probably is
because of all the allocation.
Consider foldByKey instead which starts with an empty StringBuilder as its
zero value. This will build up the result far more efficiently.
On Nov 10, 2014 8:37 AM, "YANG Fan" wrote:
> Hi,
>
>
Hi,
I've got a huge list of key-value pairs, where the key is an integer and
the value is a long string(around 1Kb). I want to concatenate the strings
with the same keys.
Initially I did something like: pairs.reduceByKey((a, b) => a+" "+b)
Then tried to save the result to HDFS. But it was extrem