PageRank execution imbalance, might hurt performance by 6x

2014-09-27 Thread Larry Xiao
Hi all! I'm running PageRank on GraphX, and I find on some tasks on one machine can spend 5~6 times more time than on others, others are perfectly balance (around 1 second to finish). And since time for a stage (iteration) is determined by the slowest task, the performance is undesirable. I

VertexRDD partition imbalance

2014-09-25 Thread Larry Xiao
Hi all VertexRDD is partitioned with HashPartitioner, and it exhibits some imbalance of tasks. For example, Connected Components with partition strategy Edge2D: Aggregated Metrics by Executor Executor ID Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle Read Shuf

Re: GraphX graph partitioning strategy

2014-09-17 Thread Larry Xiao
Can you help take a look? Thank you! Larry On 7/24/14 2:59 PM, Larry Xiao wrote: Hi all, I'm implementing graph partitioning strategy for GraphX, learning from researches on graph computing. I have two questions: - a specific implement question: In current design, only vertex ID of src a

GraphX partitioning and threading details

2014-08-04 Thread Larry Xiao
Hi all, about GraphX partitioning details and possible optimization. * Can you tell how are partitions distributed to nodes? And inside worker, how does partitions get allocated to threads? o Is it possible to make manual configuration, like partition A => node 1, thread 1 * How

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
Sorry I mean, I tried this command ./sbt/sbt clean and now it works. Is it because of cached components no recompiled? On 8/4/14, 4:44 PM, Larry Xiao wrote: I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10) meet assembly problem. $ ./sbt/sbt assembly Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME. Note, this will be overridden by

Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-03 Thread Larry Xiao
On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10) meet assembly problem. $ ./sbt/sbt assembly Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from ~/spark/project/project [info] L

package/assemble with local spark

2014-07-28 Thread Larry Xiao
Hi, How do you package an app with modified spark? In seems sbt would resolve the dependencies, and use the official spark release. Thank you! Larry

Re: GraphX graph partitioning strategy

2014-07-25 Thread Larry Xiao
On 7/26/14, 4:03 AM, Ankur Dave wrote: Oops, the code should be: val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128 def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ... // Get the triplets using GraphX, then use Spark to repartition themval partitionedEdges

GraphX graph partitioning strategy

2014-07-24 Thread Larry Xiao
at advice would you give considering partitioning, considering the procedure Spark adopt on graph processing? Any advice is much appreciated. Best Regards, Larry Xiao Reference Bipartite-oriented Distributed Graph Partitioning for Big Learning. PowerLyra : Differentiated Graph Computation and