s message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/trouble-with-join-on-large-RDDs-tp3864p4243.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
I set SPARK_MEM in the driver process by setting
"spark.executor.memory" to 10G. Each machine had 32G of RAM and a
dedicated 32G spill volume. I believe all of the units are in pages,
and the page size is the standard 4K. There are 15 slave nodes in the
cluster and the sizes of the datasets I'm
A JVM can easily be limited in how much memory it uses with the -Xmx
parameter, but Python doesn't have memory limits built in in such a
first-class way. Maybe the memory limits aren't making it to the python
executors.
What was your SPARK_MEM setting? The JVM below seems to be using 603201
(pag
On Mon, Apr 7, 2014 at 7:37 PM, Brad Miller wrote:
> I am running the latest version of PySpark branch-0.9 and having some
> trouble with join.
>
> One RDD is about 100G (25GB compressed and serialized in memory) with
> 130K records, the other RDD is about 10G (2.5G compressed and
> serialized in