Hi thanks for the reply. Actually I cant share details as it is classified and pretty complex to understand as it is not general problem I am trying to solve related to database dynamic sql order execution. I need to use Spark as my other jobs which dont use coalesce uses spark. My source data is hive orc table partitions and with Spark it is easy to load orc files in DataFrame. Initially I have 24 orc files/split and hence 24 partitions but when I do sourceFrame.toJavaRDD().coalesce(1,true) this is where it stucks hangs for hours and do nothing I am sure even it is not hitting 2GB limit as data set size is small I dont understand why it just hangs there. I have seen same code runs fine when dataset is smaller than regular size over weekend.
-- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-optimiz-and-make-this-code-faster-using-coalesce-1-and-mapPartitionIndex-tp25947p25966.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org