GraphX graph partitioning strategy

2014-07-24 Thread Larry Xiao
at advice would you give considering partitioning, considering the procedure Spark adopt on graph processing? Any advice is much appreciated. Best Regards, Larry Xiao Reference Bipartite-oriented Distributed Graph Partitioning for Big Learning. PowerLyra : Differentiated Graph Computation and

Re: GraphX graph partitioning strategy

2014-07-25 Thread Larry Xiao
On 7/26/14, 4:03 AM, Ankur Dave wrote: Oops, the code should be: val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128 def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ... // Get the triplets using GraphX, then use Spark to repartition themval partitionedEdges

package/assemble with local spark

2014-07-28 Thread Larry Xiao
Hi, How do you package an app with modified spark? In seems sbt would resolve the dependencies, and use the official spark release. Thank you! Larry

Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-03 Thread Larry Xiao
On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10) meet assembly problem. $ ./sbt/sbt assembly Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME. Note, this will be overridden by -java-home if it is set. [info] Loading project definition from ~/spark/project/project [info] L

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10) meet assembly problem. $ ./sbt/sbt assembly Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME. Note, this will be overridden by

Re: Compiling Spark master (6ba6c3eb) with sbt/sbt assembly

2014-08-04 Thread Larry Xiao
Sorry I mean, I tried this command ./sbt/sbt clean and now it works. Is it because of cached components no recompiled? On 8/4/14, 4:44 PM, Larry Xiao wrote: I guessed ./sbt/sbt clean and it works fine now. On 8/4/14, 11:48 AM, Larry Xiao wrote: On the latest pull today

GraphX partitioning and threading details

2014-08-04 Thread Larry Xiao
Hi all, about GraphX partitioning details and possible optimization. * Can you tell how are partitions distributed to nodes? And inside worker, how does partitions get allocated to threads? o Is it possible to make manual configuration, like partition A => node 1, thread 1 * How

Re: GraphX graph partitioning strategy

2014-09-17 Thread Larry Xiao
Can you help take a look? Thank you! Larry On 7/24/14 2:59 PM, Larry Xiao wrote: Hi all, I'm implementing graph partitioning strategy for GraphX, learning from researches on graph computing. I have two questions: - a specific implement question: In current design, only vertex ID of src a

VertexRDD partition imbalance

2014-09-25 Thread Larry Xiao
Hi all VertexRDD is partitioned with HashPartitioner, and it exhibits some imbalance of tasks. For example, Connected Components with partition strategy Edge2D: Aggregated Metrics by Executor Executor ID Task Time Total Tasks Failed Tasks Succeeded Tasks Input Shuffle Read Shuf

PageRank execution imbalance, might hurt performance by 6x

2014-09-27 Thread Larry Xiao
Hi all! I'm running PageRank on GraphX, and I find on some tasks on one machine can spend 5~6 times more time than on others, others are perfectly balance (around 1 second to finish). And since time for a stage (iteration) is determined by the slowest task, the performance is undesirable. I