This may seem contrived but, suppose I wanted to create a collection of "single column" RDD's that contain calculated values, so I want to cache these to avoid re-calc.
i.e.
rdd1 = {Names]
rdd2 = {Star Sign}
rdd3 = {Age}
Then I want to create a new virtual RDD that is a collection of these
RDD's to create a "multi-column" RDD
rddA = {Names, Age}
rddB = {Names, Star Sign}
I saw that rdd.union() merges rows, but anything that can combine columns?
Cheers
- Ian
