bq. all sort of optimizations like Tungsten For Tungsten, please use 1.5.1 release.
On Sat, Oct 10, 2015 at 6:24 PM, Alex Rovner <alex.rov...@magnetic.com> wrote: > How many executors are you running with? How many nodes in your cluster? > > > On Thursday, October 8, 2015, unk1102 <umesh.ka...@gmail.com> wrote: > >> Hi as recommended I am caching my Spark job dataframe as >> dataframe.persist(StorageLevels.MEMORY_AND_DISK_SER) but what I see in >> Spark >> job UI is this persist stage runs for so long showing 10 GB of shuffle >> read >> and 5 GB of shuffle write it takes to long to finish and because of that >> sometimes my Spark job throws timeout or throws OOM and hence executors >> gets >> killed by YARN. I am using Spark 1.4.1. I am using all sort of >> optimizations >> like Tungsten, Kryo I have given storage.memoryFraction as 0.2 and >> storage.shuffle as 0.2 also. My data is huge around 1 TB I am using >> default >> 200 partitions for spark.sql.shuffle.partitions. Please help me I am >> clueless please guide. >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/Why-dataframe-persist-StorageLevels-MEMORY-AND-DISK-SER-hangs-for-long-time-tp24981.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > -- > *Alex Rovner* > *Director, Data Engineering * > *o:* 646.759.0052 > > * <http://www.magnetic.com/>* > >