Performance of algorithm on top of Pregel non linearly depends on the number of iterations

lamerman Tue, 12 Apr 2016 07:51:18 -0700

I try to run a greedy graph partitioning algorithm on top of Pregel. 

My algorithm is pretty easy


val InitialMsg = 9999
val NoGroup = -1

def mergeMsg(msg1: Int, msg2: Int): Int = {
    msg1 min msg2
}

def vprog(vertexId: VertexId, value: Long, message: Int): Long = {
    if (message == InitialMsg) {
        value
    } else {
        message
    }
}

def sendMsg(triplet: EdgeTriplet[Long, Boolean]): Iterator[(VertexId, Int)]
= {
    if (triplet.srcAttr == NoGroup) {
        Iterator.empty
    } else if (triplet.dstAttr != NoGroup) {
        Iterator.empty
    } else { 
        Iterator((triplet.dstId, triplet.srcAttr.toInt))
    }
}

Before running it I just assign groups (1,2,3...n) to random vertices of a
big graph. The rest remains with no group (the value is -1).

It makes the actual partitioning but the problem is that every next job is
slower than previous. In the beginning I understand why it may get slower.
The number of active vertices grows until some point. But after it the
runtime of every job should descend as the number of the active vertices
decreases until it reaches zero.

But in reality in only grows. I can make 80 iterations on the same graph for
1 minute, when 150 iterations require 17 minutes and I expect that 300 will
require hours. As I understand the algorithm, the complexity of each next
job should not increase, but from the numbers it does.

The Spark version is 1.6.0 and 1.6.1 for this behaviour.

What is wrong?
If you need more info let me know.

Thank you



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-algorithm-on-top-of-Pregel-non-linearly-depends-on-the-number-of-iterations-tp26760.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Performance of algorithm on top of Pregel non linearly depends on the number of iterations

Reply via email to