Hello,

Great thanks for your reply. From the code I found that the reason why my
program will scan all the edges is becasue of the EdgeDirection I passed
into is EdgeDirection.Either.

However I still met the problem of "Time consuming of each iteration will
not decrease by time". Thus I have two questions:

1. what is the meaning of activeFraction in [1]
2. As my edgeRDD is too large to cache into memory, I used
StorageLevel.MEMORY_AND_DISK_SER as persist level. thus if the program used
"aggregateMessagesIndexScan", will the program still have to load all edge
list into the memory?

[1]
https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L237-266

Alcaid


2015-04-10 2:47 GMT+08:00 Ankur Dave <ankurd...@gmail.com>:

> Actually, GraphX doesn't need to scan all the edges, because it
> maintains a clustered index on the source vertex id (that is, it sorts
> the edges by source vertex id and stores the offsets in a hash table).
> If the activeDirection is appropriately set, it can then jump only to
> the clusters with active source vertices.
>
> See the EdgePartition#index field [1], which stores the offsets, and
> the logic in GraphImpl#aggregateMessagesWithActiveSet [2], which
> decides whether to do a full scan or use the index.
>
> [1]
> https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/EdgePartition.scala#L60
> [2].
> https://github.com/apache/spark/blob/master/graphx/src/main/scala/org/apache/spark/graphx/impl/GraphImpl.scala#L237-266
>
> Ankur
>
>
> On Thu, Apr 9, 2015 at 3:21 AM, James <alcaid1...@gmail.com> wrote:
> > In aggregateMessagesWithActiveSet, Spark still have to read all edges. It
> > means that a fixed time which scale with graph size is unavoidable on a
> > pregel-like iteration.
> >
> > But what if I have to iterate nearly 100 iterations but at the last 50
> > iterations there are only < 0.1% nodes need to be updated ? The fixed
> time
> > make the program finished at a unacceptable time consumption.
>

Reply via email to