This is definitely useful, but in reality it might be very difficult to do.


On Mon, Aug 29, 2016 at 6:46 PM, Fang Zhang <fang.zhang...@gmail.com> wrote:

> Dear developers,
>
> I am running some tests using Pregel API.
>
> It seems to me that more than 90% of the volume of a graph object is
> composed of index structures that will not change during the execution of
> Pregel. When the size of a graph is too huge to fit in memory, Pregel will
> persist intermediate graphs on disk each time, which seems to involve a lot
> of repeated disk savings.
>
> In my test(Shortest Path), I save only one copy of the initial graph and
> maintain only a var of RDD[(VertexID, VD)]. To create new messages, I
> create
> a new graph using updated RDD[(VertexId, VD)] and the fixed data in initial
> graph during each iteration. Using a slow NTFS hard drive, I did observe
> around 40% overall improvement. Note my updateVertices(corresponding to
> joinVertices) and edges.upgrade are not optimized yet (they can be
> optimized
> following the follow of GraphX) and the improvement should be from I/O.
>
> So my question is: do you think the current flow of Pregel could be
> improved
> by saving a small portion of a large Graph object? If there are other
> concerns, could you explain them?
>
> Best regards,
> Fang
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Saving-less-data-
> to-improve-Pregel-performance-in-GraphX-tp18762.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Reply via email to