Hi Unk1102 I also had trouble when I used coalesce(). Reparation() worked much better. Keep in mind if you have a large number of portions you are probably going have high communication costs.
Also my code works a lot better on 1.6.0. DataFrame memory was not be spilled in 1.5.2. In 1.6.0 unpersist() actually frees up memory Another strange thing I noticed in 1.5.1 was that I had thousands of partitions. Many of them where empty. Have lots of empty partitions really slowed things down Andy From: unk1102 <umesh.ka...@gmail.com> Date: Tuesday, January 5, 2016 at 11:58 AM To: "user @spark" <user@spark.apache.org> Subject: coalesce(1).saveAsTextfile() takes forever? > hi I am trying to save many partitions of Dataframe into one CSV file and it > take forever for large data sets of around 5-6 GB. > > sourceFrame.coalesce(1).write().format("com.databricks.spark.csv").option("gzi > p").save("/path/hadoop") > > For small data above code works well but for large data it hangs forever > does not move on because of only one partitions has to shuffle data of GBs > please help me > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/coalesce-1-saveAsTextfile- > takes-forever-tp25886.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >