I try to run a greedy graph partitioning algorithm on top of Pregel. My algorithm is pretty easy
val InitialMsg = 9999 val NoGroup = -1 def mergeMsg(msg1: Int, msg2: Int): Int = { msg1 min msg2 } def vprog(vertexId: VertexId, value: Long, message: Int): Long = { if (message == InitialMsg) { value } else { message } } def sendMsg(triplet: EdgeTriplet[Long, Boolean]): Iterator[(VertexId, Int)] = { if (triplet.srcAttr == NoGroup) { Iterator.empty } else if (triplet.dstAttr != NoGroup) { Iterator.empty } else { Iterator((triplet.dstId, triplet.srcAttr.toInt)) } } Before running it I just assign groups (1,2,3...n) to random vertices of a big graph. The rest remains with no group (the value is -1). It makes the actual partitioning but the problem is that every next job is slower than previous. In the beginning I understand why it may get slower. The number of active vertices grows until some point. But after it the runtime of every job should descend as the number of the active vertices decreases until it reaches zero. But in reality in only grows. I can make 80 iterations on the same graph for 1 minute, when 150 iterations require 17 minutes and I expect that 300 will require hours. As I understand the algorithm, the complexity of each next job should not increase, but from the numbers it does. The Spark version is 1.6.0 and 1.6.1 for this behaviour. What is wrong? If you need more info let me know. Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Performance-of-algorithm-on-top-of-Pregel-non-linearly-depends-on-the-number-of-iterations-tp26760.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org