This is definitely useful, but in reality it might be very difficult to do.
On Mon, Aug 29, 2016 at 6:46 PM, Fang Zhang <fang.zhang...@gmail.com> wrote: > Dear developers, > > I am running some tests using Pregel API. > > It seems to me that more than 90% of the volume of a graph object is > composed of index structures that will not change during the execution of > Pregel. When the size of a graph is too huge to fit in memory, Pregel will > persist intermediate graphs on disk each time, which seems to involve a lot > of repeated disk savings. > > In my test(Shortest Path), I save only one copy of the initial graph and > maintain only a var of RDD[(VertexID, VD)]. To create new messages, I > create > a new graph using updated RDD[(VertexId, VD)] and the fixed data in initial > graph during each iteration. Using a slow NTFS hard drive, I did observe > around 40% overall improvement. Note my updateVertices(corresponding to > joinVertices) and edges.upgrade are not optimized yet (they can be > optimized > following the follow of GraphX) and the improvement should be from I/O. > > So my question is: do you think the current flow of Pregel could be > improved > by saving a small portion of a large Graph object? If there are other > concerns, could you explain them? > > Best regards, > Fang > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Saving-less-data- > to-improve-Pregel-performance-in-GraphX-tp18762.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >