Re: dealing with large values in kv pairs

2014-11-10 Thread Sean Owen
You are suggesting that the String concatenation is slow? It probably is because of all the allocation. Consider foldByKey instead which starts with an empty StringBuilder as its zero value. This will build up the result far more efficiently. On Nov 10, 2014 8:37 AM, "YANG Fan" wrote: > Hi, > >

dealing with large values in kv pairs

2014-11-10 Thread YANG Fan
Hi, I've got a huge list of key-value pairs, where the key is an integer and the value is a long string(around 1Kb). I want to concatenate the strings with the same keys. Initially I did something like: pairs.reduceByKey((a, b) => a+" "+b) Then tried to save the result to HDFS. But it was extrem