Hi Roy,
Thanks for your help, I write a small code snippet that could reproduce the
problem.
Could you help me read through it and see if I did anything wrong?
Thanks!
def main(args: Array[String]) {
val conf = new SparkConf().setAppName(“TEST")
.setMaster("local[4]")
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
.set("spark.kryo.registrator", "edu.nd.dsg.hdtm.util.HDTMKryoRegistrator")
val sc = new SparkContext(conf)
val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L)))
val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L),
Edge(2L, 0L, 2L)))
val newGraph = Graph(v, e)
var currentGraph = newGraph
val vertexIds = currentGraph.vertices.map(_._1).collect()
for (i <- 1 to 1000) {
var g = currentGraph
vertexIds.toStream.foreach(id => {
g = Graph(currentGraph.vertices, currentGraph.edges)
g.cache()
g.edges.cache()
g.vertices.cache()
g.vertices.count()
g.edges.count()
})
currentGraph.unpersistVertices(blocking = false)
currentGraph.edges.unpersist(blocking = false)
currentGraph = g
println(" iter "+i+" finished")
}
}
Baoxu Shi(Dash)
Computer Science and Engineering Department
University of Notre Dame
[email protected]
> On Jun 19, 2014, at 1:47 AM, roy20021 [via Apache Spark User List]
> <[email protected]> wrote:
>
> No sure if it can help, btw:
> Checkpoint cuts the lineage. The checkpoint method is a flag. In order to
> actually perform the checkpoint you must do NOT materialise the RDD before it
> has been flagged otherwise the flag is just ignored.
>
> rdd2 = rdd1.map(..)
> rdd2.checkpoint()
> rdd2.count
> rdd2.isCheckpointed // true
>
> Il mercoledì 18 giugno 2014, dash <[hidden email]> ha scritto:
> > If a RDD object have non-empty .dependencies, does that means it have
> > lineage? How could I remove it?
> >
> > I'm doing iterative computing and each iteration depends on the result
> > computed in previous iteration. After several iteration, it will throw
> > StackOverflowError.
> >
> > At first I'm trying to use cache, I read the code in pregel.scala, which is
> > part of GraphX, they use a count method to materialize the object after
> > cache, but I attached a debugger and seems such approach does not empty
> > .dependencies, and that also does not work in my code.
> >
> > Another alternative approach is using checkpoint, I tried checkpoint
> > vertices and edges for my Graph object and then materialize it by count
> > vertices and edges. Then I use .isCheckpointed to check if it is correctly
> > checkpointed, but it always return false.
> >
> >
> >
> > --
> > View this message in context:
> > http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
>
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779p7892.html
> To unsubscribe from Best practices for removing lineage of a RDD or Graph
> object?, click here.
> NAML
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779p7893.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.