Hello,
Great thanks for your reply. From the code I found that the reason why my
program will scan all the edges is becasue of the EdgeDirection I passed
into is EdgeDirection.Either.
However I still met the problem of "Time consuming of each iteration will
not decrease by time". Thus I have two
Actually, GraphX doesn't need to scan all the edges, because it
maintains a clustered index on the source vertex id (that is, it sorts
the edges by source vertex id and stores the offsets in a hash table).
If the activeDirection is appropriately set, it can then jump only to
the clusters with activ
In aggregateMessagesWithActiveSet, Spark still have to read all edges. It
means that a fixed time which scale with graph size is unavoidable on a
pregel-like iteration.
But what if I have to iterate nearly 100 iterations but at the last 50
iterations there are only < 0.1% nodes need to be updated
We thought it would be better to simplify the interface, since the
active set is a performance optimization but the result is identical
to calling subgraph before aggregateMessages.
The active set option is still there in the package-private method
aggregateMessagesWithActiveSet. You can actually
Hello,
The old api of GraphX "mapReduceTriplets" has an optional parameter
"activeSetOpt: Option[(VertexRDD[_]" that limit the input of sendMessage.
However, to the new api "aggregateMessages" I could not find this option,
why it does not offer any more?
Alcaid