Im assuming the dataset your dealing with is big hence why you wanted to allocate ur full 16gb of Ram to it.
I suggest running the python spark-shell as such "pyspark --driver-memory 16g". Also if you cache your data and it doesn't fully fit in memory you can do df.cache(StorageLevel.MEMORY_AND_DISK). -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Java-Heap-Error-tp27669p27707.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org