I know the cache operation can cache data in memoyr/disk...
But I am expecting to know will other operation will do the same?
For example, I created a dataframe called df. The df is big so when I run
some action like :
df.sort(column_name).show()
df.collect()
It will throw error like :
16/05/17 10:53:36 ERROR Executor: Managed memory leak detected; size =
2359296 bytes, TID = 15
16/05/17 10:53:36 ERROR Executor: Exception in task 0.0 in stage 12.0
(TID
15)
org.apache.spark.api.python.PythonException: Traceback (most recent call
last):
File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
line 111, in main
process()
File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/worker.py",
line 106, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File
"/opt/spark-1.6.0-bin-hadoop2.6/python/lib/pyspark.zip/pyspark/serializers.py",
line 263, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "<stdin>", line 1, in <lambda>
IndexError: list index out of range
I want to know is there any way or configuration to let spark swap memory
into disk for this situation?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Will-spark-swap-memory-out-to-disk-if-the-memory-is-not-enough-tp26968.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]