Hi thanks for the reply. Actually I cant share details as it is classified
and pretty complex to understand as it is not general problem I am trying to
solve related to database dynamic sql order execution. I need to use Spark
as my other jobs which dont use coalesce uses spark. My source data is hive
orc table partitions and with Spark it is easy to load orc files in
DataFrame. Initially I have 24 orc files/split and hence 24 partitions but
when I do sourceFrame.toJavaRDD().coalesce(1,true) this is where it stucks
hangs for hours and do nothing I am sure even it is not hitting 2GB limit as
data set size is small I dont understand why it just hangs there. I have
seen same code runs fine when dataset is smaller than regular size over
weekend.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-optimiz-and-make-this-code-faster-using-coalesce-1-and-mapPartitionIndex-tp25947p25966.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to