configuration needed to run twitter(25GB) dataset

Jiaxin Shi Thu, 31 Jul 2014 05:29:22 -0700

We have a 6-nodes cluster , each node has 64GB memory.

here is the command:
./bin/spark-submit --class
org.apache.spark.examples.graphx.LiveJournalPageRank
examples/target/scala-2.10/spark-examples-1.0.1-hadoop1.0.4.jar
hdfs://dataset/twitter --tol=0.01 --numEPart=144 --numIter=10


But it ran out of memory. I also try 2D and 1D partition.

And I also try Giraph under the same configuration, and it runs for 10
iterations , and then it ran out of memory as well.

Actually I don't know whether the command is right.
Should the numEPart equal to the number of nodes or number of nodes*cores?
I think if numEPart is smaller, it will require less memory, just like the
powergraph.

Thanks in advance!

configuration needed to run twitter(25GB) dataset

Reply via email to