On Thu, Jul 31, 2014 at 08:28 PM, Jiaxin Shi <shijiaxin...@gmail.com> wrote: > We have a 6-nodes cluster , each node has 64GB memory. > [...] > But it ran out of memory. I also try 2D and 1D partition. > > And I also try Giraph under the same configuration, and it runs for 10 > iterations , and then it ran out of memory as well.
If Giraph is also running out of memory, it sounds like the graph is just too big to fit entirely in memory on your cluster. In that case, you could try changing the storage level from MEMORY_ONLY (the default) to MEMORY_AND_DISK. That would allow GraphX to spill partitions to disk, hurting performance but at least allowing the computation to finish. You can do this by passing --edgeStorageLevel=MEMORY_AND_DISK --vertexStorageLevel=MEMORY_AND_DISK to spark-submit. > Should the numEPart equal to the number of nodes or number of nodes*cores? > I think if numEPart is smaller, it will require less memory, just like the > powergraph. Right, increasing the number of edge partitions will increase the memory and communication overhead for both GraphX and PowerGraph. Setting the number of edge partitions to the total number of cores (nodes * cores) is a good starting point since it will allow GraphX to exploit parallelism fully, and you can experiment with half or double that number if necessary. Ankur