Hello,
Currently I work on a project in which:
I spawn a standalone Apache Spark MLlib job in Standalone mode, from a
running Java Process.
In the code of the Spark Job I have the following code:
SparkConf sparkConf = new SparkConf().setAppName("SparkParallelLoad");
sparkConf.set("spark.executor.memory", "8g");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
...
Also, in my ~/spark/conf/spark-env.sh I have the following values:
SPARK_WORKER_CORES=1
export SPARK_WORKER_CORES=1
SPARK_WORKER_MEMORY=2g
export SPARK_WORKER_MEMORY=2g
SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.spark.executor.memory=4g"
During runtime I receive a Java OutOfMemory exception and a Core dump. My
dataset is less than 1 GB and I want to make sure that I cache it all in
memory for my ML task.
Am I increasing the JVM Heap Memory correctly? Am I doing something wrong?
Thank you,
Nick