Re: how to save RDD partitions in different folders?

dmpour23 Tue, 22 Apr 2014 02:28:53 -0700

I am not exacly sure how to use MultipleOutput in Spark. Have been looking
into Apache Crunch ? in its guide http://crunch.apache.org/user-guide.html
it states that:


Multiple outputs: Spark doesn't have a concept of multiple outputs; when you
write a data set to disk, the pipeline that creates that data set runs
immediately. This means that you need to be a little bit clever about
caching intermediate stages so you don't end up re-running a big long
pipeline multiple times in order to write a couple of outputs. Crunch does
that for you, along with the same output format and parameter wrapping you
get for multiple inputs.

Is this correct or is there another way of solving the problem?



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/how-to-save-RDD-partitions-in-different-folders-tp3754p4591.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: how to save RDD partitions in different folders?

Reply via email to