Hi Roy, Thanks for your help, I write a small code snippet that could reproduce the problem. Could you help me read through it and see if I did anything wrong?
Thanks! def main(args: Array[String]) { val conf = new SparkConf().setAppName(“TEST") .setMaster("local[4]") .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") .set("spark.kryo.registrator", "edu.nd.dsg.hdtm.util.HDTMKryoRegistrator") val sc = new SparkContext(conf) val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L))) val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), Edge(2L, 0L, 2L))) val newGraph = Graph(v, e) var currentGraph = newGraph val vertexIds = currentGraph.vertices.map(_._1).collect() for (i <- 1 to 1000) { var g = currentGraph vertexIds.toStream.foreach(id => { g = Graph(currentGraph.vertices, currentGraph.edges) g.cache() g.edges.cache() g.vertices.cache() g.vertices.count() g.edges.count() }) currentGraph.unpersistVertices(blocking = false) currentGraph.edges.unpersist(blocking = false) currentGraph = g println(" iter "+i+" finished") } } Baoxu Shi(Dash) Computer Science and Engineering Department University of Notre Dame b...@nd.edu > On Jun 19, 2014, at 1:47 AM, roy20021 [via Apache Spark User List] > <ml-node+s1001560n7892...@n3.nabble.com> wrote: > > No sure if it can help, btw: > Checkpoint cuts the lineage. The checkpoint method is a flag. In order to > actually perform the checkpoint you must do NOT materialise the RDD before it > has been flagged otherwise the flag is just ignored. > > rdd2 = rdd1.map(..) > rdd2.checkpoint() > rdd2.count > rdd2.isCheckpointed // true > > Il mercoledì 18 giugno 2014, dash <[hidden email]> ha scritto: > > If a RDD object have non-empty .dependencies, does that means it have > > lineage? How could I remove it? > > > > I'm doing iterative computing and each iteration depends on the result > > computed in previous iteration. After several iteration, it will throw > > StackOverflowError. > > > > At first I'm trying to use cache, I read the code in pregel.scala, which is > > part of GraphX, they use a count method to materialize the object after > > cache, but I attached a debugger and seems such approach does not empty > > .dependencies, and that also does not work in my code. > > > > Another alternative approach is using checkpoint, I tried checkpoint > > vertices and edges for my Graph object and then materialize it by count > > vertices and edges. Then I use .isCheckpointed to check if it is correctly > > checkpointed, but it always return false. > > > > > > > > -- > > View this message in context: > > http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > If you reply to this email, your message will be added to the discussion > below: > http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779p7892.html > To unsubscribe from Best practices for removing lineage of a RDD or Graph > object?, click here. > NAML -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779p7893.html Sent from the Apache Spark User List mailing list archive at Nabble.com.