But this would be applicable only to operations that have a shuffle phase. This might not be applicable to a simple Map operation where a record is mapped to a new huge value, resulting in OutOfMemory Error.
On Mon, Aug 18, 2014 at 12:34 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote: > I believe spark.shuffle.memoryFraction is the one you are looking for. > > spark.shuffle.memoryFraction : Fraction of Java heap to use for > aggregation and cogroups during shuffles, if spark.shuffle.spill is true. > At any given time, the collective size of all in-memory maps used for > shuffles is bounded by this limit, beyond which the contents will begin to > spill to disk. If spills are often, consider increasing this value at the > expense of spark.storage.memoryFraction. > > You can give it a try. > > > Thanks > Best Regards > > > On Mon, Aug 18, 2014 at 12:21 PM, Ghousia <ghousia.ath...@gmail.com> > wrote: > >> Thanks for the answer Akhil. We are right now getting rid of this issue >> by increasing the number of partitions. And we are persisting RDDs to >> DISK_ONLY. But the issue is with heavy computations within an RDD. It would >> be better if we have the option of spilling the intermediate transformation >> results to local disk (only in case if memory consumption is high) . Do we >> have any such option available with Spark? If increasing the partitions is >> the only the way, then one might end up with OutOfMemory Errors, when >> working with certain algorithms where intermediate result is huge. >> >> >> On Mon, Aug 18, 2014 at 12:02 PM, Akhil Das <ak...@sigmoidanalytics.com> >> wrote: >> >>> Hi Ghousia, >>> >>> You can try the following: >>> >>> 1. Increase the heap size >>> <https://spark.apache.org/docs/0.9.0/configuration.html> >>> 2. Increase the number of partitions >>> <http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine> >>> 3. You could try persisting the RDD to use DISK_ONLY >>> <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence> >>> >>> >>> >>> Thanks >>> Best Regards >>> >>> >>> On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj <ghousia.ath...@gmail.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I am trying to implement machine learning algorithms on Spark. I am >>>> working >>>> on a 3 node cluster, with each node having 5GB of memory. Whenever I am >>>> working with slightly more number of records, I end up with OutOfMemory >>>> Error. Problem is, even if number of records is slightly high, the >>>> intermediate result from a transformation is huge and this results in >>>> OutOfMemory Error. To overcome this, we are partitioning the data such >>>> that >>>> each partition has only a few records. >>>> >>>> Is there any better way to fix this issue. Some thing like spilling the >>>> intermediate data to local disk? >>>> >>>> Thanks, >>>> Ghousia. >>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html >>>> Sent from the Apache Spark User List mailing list archive at Nabble.com. >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> >