I am not exacly sure how to use MultipleOutput in Spark. Have been looking into Apache Crunch ? in its guide http://crunch.apache.org/user-guide.html it states that:
Multiple outputs: Spark doesn't have a concept of multiple outputs; when you write a data set to disk, the pipeline that creates that data set runs immediately. This means that you need to be a little bit clever about caching intermediate stages so you don't end up re-running a big long pipeline multiple times in order to write a couple of outputs. Crunch does that for you, along with the same output format and parameter wrapping you get for multiple inputs. Is this correct or is there another way of solving the problem? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-save-RDD-partitions-in-different-folders-tp3754p4591.html Sent from the Apache Spark User List mailing list archive at Nabble.com.