Hi all!
I'm running PageRank on GraphX, and I find on some tasks on one machine
can spend 5~6 times more time than on others, others are perfectly
balance (around 1 second to finish).
And since time for a stage (iteration) is determined by the slowest
task, the performance is undesirable.
I
Hi all
VertexRDD is partitioned with HashPartitioner, and it exhibits some
imbalance of tasks.
For example, Connected Components with partition strategy Edge2D:
Aggregated Metrics by Executor
Executor ID Task Time Total Tasks Failed Tasks Succeeded Tasks
Input Shuffle Read Shuf
Can you help take a look?
Thank you!
Larry
On 7/24/14 2:59 PM, Larry Xiao wrote:
Hi all,
I'm implementing graph partitioning strategy for GraphX, learning from
researches on graph computing.
I have two questions:
- a specific implement question:
In current design, only vertex ID of src a
Hi all,
about GraphX partitioning details and possible optimization.
* Can you tell how are partitions distributed to nodes? And inside
worker, how does partitions get allocated to threads?
o Is it possible to make manual configuration, like partition A =>
node 1, thread 1
* How
Sorry I mean, I tried this command
./sbt/sbt clean
and now it works.
Is it because of cached components no recompiled?
On 8/4/14, 4:44 PM, Larry Xiao wrote:
I guessed
./sbt/sbt clean
and it works fine now.
On 8/4/14, 11:48 AM, Larry Xiao wrote:
On the latest pull today
I guessed
./sbt/sbt clean
and it works fine now.
On 8/4/14, 11:48 AM, Larry Xiao wrote:
On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10)
meet assembly problem.
$ ./sbt/sbt assembly
Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME.
Note, this will be overridden by
On the latest pull today (6ba6c3ebfe9a47351a50e45271e241140b09bf10) meet
assembly problem.
$ ./sbt/sbt assembly
Using /usr/lib/jvm/java-7-oracle as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
[info] Loading project definition from ~/spark/project/project
[info] L
Hi,
How do you package an app with modified spark?
In seems sbt would resolve the dependencies, and use the official spark
release.
Thank you!
Larry
On 7/26/14, 4:03 AM, Ankur Dave wrote:
Oops, the code should be:
val unpartitionedGraph: Graph[Int, Int] = ...val numPartitions: Int = 128
def getTripletPartition(e: EdgeTriplet[Int, Int]): PartitionID = ...
// Get the triplets using GraphX, then use Spark to repartition
themval partitionedEdges
at advice would you give considering partitioning, considering the
procedure Spark adopt on graph processing?
Any advice is much appreciated.
Best Regards,
Larry Xiao
Reference
Bipartite-oriented Distributed Graph Partitioning for Big Learning.
PowerLyra : Differentiated Graph Computation and
10 matches
Mail list logo