I am currently using the RDD aggregate operation to reduce (fold) per partition and then combine using the RDD aggregate operation. def aggregate[U: ClassTag](zeroValue: U)(seqOp: (U, T) => U, combOp: (U, U) => U): U
I need to perform a transform operation after the seqOp and before the combOp. The signature would look like def foldTransformCombine[U: ClassTag](zeroReduceValue: V, zeroCombineValue: U)(seqOp: (V, T) => V, transformOp: (V) => U, combOp: (U, U) => U): U This is especially useful in the scenario where the transformOp is expensive and should be performed once per partition before combining. Is there a way to accomplish this with existing RDD operations? If yes, great but if not, should we consider adding such a general transformation to the list of RDD operations? -Manish