Hi there, 

I have a bunch of data in a RDD, which I processed it one by one previously.
For example, there was a RDD denoted by "data: RDD[Object]" and then I
processed it using "data.map(...)".  However, I got a new requirement to
process the data in a patched way. It means that I need to convert the RDD
from RDD[Object] to RDD[Array[Object]] and then process it, which is to fill
out this function: def convert2array(inputs: RDD[Object], batchedDegree:
Int): RDD[Array[Object]] = {...}. 

I hope that after the conversion, each element of the new RDD is an array of
the previous RDD elements. The parameter "batchedDegree" specifies how many
elements are batched together. For example, if I have 210 elements in the
previous RDD, the result of the conversion functions should be a RDD with 3
elements. Each element is an array, and the first two arrays contains 1~100
and 101~200 elements. The third element contains 201~210 elements.   

I was wondering if anybody could help me complete this scala function with
an efficient way. Thanks a lot.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Convert-from-RDD-Object-to-RDD-Array-Object-tp9530.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Reply via email to