But this would be applicable only to operations that have a shuffle phase.

This might not be applicable to a simple Map operation where a record is
mapped to a new huge value, resulting in OutOfMemory Error.



On Mon, Aug 18, 2014 at 12:34 PM, Akhil Das <ak...@sigmoidanalytics.com>
wrote:

> I believe spark.shuffle.memoryFraction is the one you are looking for.
>
> spark.shuffle.memoryFraction : Fraction of Java heap to use for
> aggregation and cogroups during shuffles, if spark.shuffle.spill is true.
> At any given time, the collective size of all in-memory maps used for
> shuffles is bounded by this limit, beyond which the contents will begin to
> spill to disk. If spills are often, consider increasing this value at the
> expense of spark.storage.memoryFraction.
>
> You can give it a try.
>
>
> Thanks
> Best Regards
>
>
> On Mon, Aug 18, 2014 at 12:21 PM, Ghousia <ghousia.ath...@gmail.com>
> wrote:
>
>> Thanks for the answer Akhil. We are right now getting rid of this issue
>> by increasing the number of partitions. And we are persisting RDDs to
>> DISK_ONLY. But the issue is with heavy computations within an RDD. It would
>> be better if we have the option of spilling the intermediate transformation
>> results to local disk (only in case if memory consumption is high)  . Do we
>> have any such option available with Spark? If increasing the partitions is
>> the only the way, then one might end up with OutOfMemory Errors, when
>> working with certain algorithms where intermediate result is huge.
>>
>>
>> On Mon, Aug 18, 2014 at 12:02 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>>> Hi Ghousia,
>>>
>>> You can try the following:
>>>
>>> 1. Increase the heap size
>>> <https://spark.apache.org/docs/0.9.0/configuration.html>
>>> 2. Increase the number of partitions
>>> <http://stackoverflow.com/questions/21698443/spark-best-practice-for-retrieving-big-data-from-rdd-to-local-machine>
>>> 3. You could try persisting the RDD to use DISK_ONLY
>>> <http://spark.apache.org/docs/latest/programming-guide.html#rdd-persistence>
>>>
>>>
>>>
>>> Thanks
>>> Best Regards
>>>
>>>
>>> On Mon, Aug 18, 2014 at 10:40 AM, Ghousia Taj <ghousia.ath...@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I am trying to implement machine learning algorithms on Spark. I am
>>>> working
>>>> on a 3 node cluster, with each node having 5GB of memory. Whenever I am
>>>> working with slightly more number of records, I end up with OutOfMemory
>>>> Error. Problem is, even if number of records is slightly high, the
>>>> intermediate result from a transformation is huge and this results in
>>>> OutOfMemory Error. To overcome this, we are partitioning the data such
>>>> that
>>>> each partition has only a few records.
>>>>
>>>> Is there any better way to fix this issue. Some thing like spilling the
>>>> intermediate data to local disk?
>>>>
>>>> Thanks,
>>>> Ghousia.
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/OutOfMemory-Error-tp12275.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>

Reply via email to