Hello, Great thanks for your reply. From the code I found that the reason why my program will scan all the edges is becasue of the EdgeDirection I passed into is EdgeDirection.Either.
However I still met the problem of "Time consuming of each iteration will not decrease by time". Thus I have two questions: 1. what is the meaning of activeFraction in [1] 2. As my edgeRDD is too large to cache into memory, I used StorageLevel.MEMORY_AND_DISK_SER as persist level. thus if the program used "aggregateMessagesIndexScan", will the program still have to load all edge list into the memory? [1] https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L237-266 Alcaid 2015-04-10 2:47 GMT+08:00 Ankur Dave <ankurd...@gmail.com>: > Actually, GraphX doesn't need to scan all the edges, because it > maintains a clustered index on the source vertex id (that is, it sorts > the edges by source vertex id and stores the offsets in a hash table). > If the activeDirection is appropriately set, it can then jump only to > the clusters with active source vertices. > > See the EdgePartition#index field [1], which stores the offsets, and > the logic in GraphImpl#aggregateMessagesWithActiveSet [2], which > decides whether to do a full scan or use the index. > > [1] > https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartition.scala#L60 > [2]. > https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L237-266 > > Ankur > > > On Thu, Apr 9, 2015 at 3:21 AM, James <alcaid1...@gmail.com> wrote: > > In aggregateMessagesWithActiveSet, Spark still have to read all edges. It > > means that a fixed time which scale with graph size is unavoidable on a > > pregel-like iteration. > > > > But what if I have to iterate nearly 100 iterations but at the last 50 > > iterations there are only < 0.1% nodes need to be updated ? The fixed > time > > make the program finished at a unacceptable time consumption. >