I am trying to run connected components algorithm in Spark. The graph has
roughly 28M edges and 3.2M vertices. Here is the code I am using

 /val inputFile =
"/user/hive/warehouse/spark_poc.db/window_compare_output_text/000000_0" 
    val conf = new SparkConf().setAppName("ConnectedComponentsTest")
    val sc = new SparkContext(conf)
    val graph = GraphLoader.edgeListFile(sc, inputFile, true, 7,
StorageLevel.MEMORY_AND_DISK, StorageLevel.MEMORY_AND_DISK);
    graph.cache();
    val cc = graph.connectedComponents();
    graph.edges.saveAsTextFile("/user/kakn/output");/

and here is the command:

/spark-submit --class ConnectedComponentsTest --master yarn-cluster 
--num-executors 7 --driver-memory 6g --executor-memory 8g --executor-cores 1
target/scala-2.10/connectedcomponentstest_2.10-1.0.jar/

It runs for about an hour and then fails with below error. *Isnt Spark
supposed to spill on disk if the RDDs dont fit into the memory?*

Application application_1418082773407_8587 failed 2 times due to AM
Container for appattempt_1418082773407_8587_000002 exited with exitCode:
-104 due to: Container
[pid=19790,containerID=container_1418082773407_8587_02_000001] is running
beyond physical memory limits. Current usage: 6.5 GB of 6.5 GB physical
memory used; 8.9 GB of 13.6 GB virtual memory used. Killing container.
Dump of the process-tree for container_1418082773407_8587_02_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS)
SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 19790 19788 19790 19790 (bash) 0 0 110809088 336 /bin/bash -c
/usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
-Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
'-Dspark.executor.memory=8g' '-Dspark.eventLog.enabled=true'
'-Dspark.yarn.secondary.jars=' '-Dspark.app.name=ConnectedComponentsTest'
'-Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory'
'-Dspark.master=yarn-cluster' org.apache.spark.deploy.yarn.ApplicationMaster
--class 'ConnectedComponentsTest' --jar
'file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar'
--executor-memory 8192 --executor-cores 1 --num-executors 7 1>
/var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stdout
2>
/var/log/hadoop-yarn/container/application_1418082773407_8587/container_1418082773407_8587_02_000001/stderr
|- 19794 19790 19790 19790 (java) 205066 9152 9477726208 1707599
/usr/java/jdk1.7.0_67-cloudera/bin/java -server -Xmx6144m
-Djava.io.tmpdir=/mnt/DATA1/yarn/nm/usercache/kakn/appcache/application_1418082773407_8587/container_1418082773407_8587_02_000001/tmp
-Dspark.executor.memory=8g -Dspark.eventLog.enabled=true
-Dspark.yarn.secondary.jars= -Dspark.app.name=ConnectedComponentsTest
-Dspark.eventLog.dir=hdfs://<server-name-replaced>:8020/user/spark/applicationHistory
-Dspark.master=yarn-cluster org.apache.spark.deploy.yarn.ApplicationMaster
--class ConnectedComponentsTest --jar
file:/home/kakn01/Spark/SparkSource/target/scala-2.10/connectedcomponentstest_2.10-1.0.jar
--executor-memory 8192 --executor-cores 1 --num-executors 7
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
.Failing this attempt.. Failing the application.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Running-beyond-memory-limits-in-ConnectedComponents-tp21139.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to