It depends on the RDD in question exactly where the work will be done. I believe that if you do a repartition(1) instead of a coalesce it will force a shuffle so the work will be done distributed and then a single node will read that shuffled data and write it out.
If you want to write to a single parquet file however, you will at some point need to block on a single node. On Thu, Sep 4, 2014 at 2:02 PM, DanteSama <chris.feder...@sojo.com> wrote: > Yep, that worked out. Does this solution have any performance implications > past all the work being done on (probably) 1 node? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-Parquet-insertInto-makes-many-files-tp13480p13501.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >