GraphX twitter

2014-11-18 Thread tom85
I'm having problems running the twitter graph on a cluster with 4 nodes, each having over 100GB of RAM and 32 virtual cores per node. I do have a pre-installed spark version (built against hadoop 2.3, because it didn't compile on my system), but I'm loading my graph file from disk without hdfs. T

Re: Pagerank implementation

2014-11-18 Thread tom85
I see, thanks. So to implement pagerank with damping factor divided by number of vertices: Is it sufficient to modify initialMessage to *val initialMessage = (resetProb / graph.vertices.count())/ (1.0 - resetProb)* instead of *val initialMessage = resetProb / (1.0 - resetProb)* and yield correc

Pagerank implementation

2014-11-15 Thread tom85
Hi, I wonder if the pagerank implementation is correct. More specifically, I look at the following function from PageRank.scala , which is given to Pregel: def vertexProgram(id: