Jeffrey Picard <jp3...@columbia.edu> writes:
> I tried unpersisting the edges and vertices of the graph by hand, then
> persisting the graph with persist(StorageLevel.MEMORY_AND_DISK). I still see
> the same behavior in connected components however, and the same thing you
> described in the storage page.

Unfortunately it's not possible to change the graph's storage level by hand 
without modifying GraphX itself, because internally GraphX will create new 
RDDs, persist them using MEMORY_ONLY, and immediately materialize them, all 
before you get a chance to change the storage level. You can see this happening 
in the storage page: one graph (a VertexRDD and an EdgeRDD) has the desired 
storage level, but new ones are still set to MEMORY_ONLY.

> It seems that the version of graphx I’m using doesn't have the option for
> setting the storage level in the GraphLoader.edgeListFile method.
> https://spark.apache.org/docs/1.0.1/api/scala/index.html#org.apache.spark.graphx.GraphLoader$
> [...]
> Would that (newer?) version of GraphX with the storage level settable in the
> edgeListFile possibly solve this, or could there still be something else going
> on?

Yes, it looks like custom storage levels would solve the problem. That was 
added in apache/spark#946 [1], which will be released as part of Spark 1.1.0. 
Until then, is it possible for you to rebuild Spark from the master branch?

Ankur

[1] https://github.com/apache/spark/pull/946

Reply via email to