Re: Best practices for removing lineage of a RDD or Graph object?

dash Wed, 18 Jun 2014 22:52:24 -0700

Hi Roy, 

Thanks for your help, I write a small code snippet that could reproduce the 
problem.
Could you help me read through it and see if I did anything wrong?


Thanks!

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName(“TEST")
      .setMaster("local[4]")
      .set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
      .set("spark.kryo.registrator", "edu.nd.dsg.hdtm.util.HDTMKryoRegistrator")
    val sc = new SparkContext(conf)

    val v = sc.parallelize(Seq[(VertexId, Long)]((0L, 0L), (1L, 1L), (2L, 2L)))
    val e = sc.parallelize(Seq[Edge[Long]](Edge(0L, 1L, 0L), Edge(1L, 2L, 1L), 
Edge(2L, 0L, 2L)))
    val newGraph = Graph(v, e)
    var currentGraph = newGraph
    val vertexIds = currentGraph.vertices.map(_._1).collect()

    for (i <- 1 to 1000) {
      var g = currentGraph
      vertexIds.toStream.foreach(id => {
        g = Graph(currentGraph.vertices, currentGraph.edges)
        g.cache()
        g.edges.cache()
        g.vertices.cache()
        g.vertices.count()
        g.edges.count()
      })

      currentGraph.unpersistVertices(blocking =  false)
      currentGraph.edges.unpersist(blocking = false)
      currentGraph = g
      println(" iter "+i+" finished")
    }

  }


Baoxu Shi(Dash)
Computer Science and Engineering Department
University of Notre Dame
b...@nd.edu



> On Jun 19, 2014, at 1:47 AM, roy20021 [via Apache Spark User List] 
> <ml-node+s1001560n7892...@n3.nabble.com> wrote:
> 
> No sure if it can help, btw:
> Checkpoint cuts the lineage. The checkpoint method is a flag. In order to 
> actually perform the checkpoint you must do NOT materialise the RDD before it 
> has been flagged otherwise the flag is just ignored.
> 
> rdd2 = rdd1.map(..)
> rdd2.checkpoint()
> rdd2.count
> rdd2.isCheckpointed // true
> 
> Il mercoledì 18 giugno 2014, dash <[hidden email]> ha scritto:
> > If a RDD object have non-empty .dependencies, does that means it have
> > lineage? How could I remove it?
> >
> > I'm doing iterative computing and each iteration depends on the result
> > computed in previous iteration. After several iteration, it will throw
> > StackOverflowError.
> >
> > At first I'm trying to use cache, I read the code in pregel.scala, which is
> > part of GraphX, they use a count method to materialize the object after
> > cache, but I attached a debugger and seems such approach does not empty
> > .dependencies, and that also does not work in my code.
> >
> > Another alternative approach is using checkpoint, I tried checkpoint
> > vertices and edges for my Graph object and then materialize it by count
> > vertices and edges. Then I use .isCheckpointed to check if it is correctly
> > checkpointed, but it always return false.
> >
> >
> >
> > --
> > View this message in context: 
> > http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> > 
> 
> If you reply to this email, your message will be added to the discussion 
> below:
> http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779p7892.html
> To unsubscribe from Best practices for removing lineage of a RDD or Graph 
> object?, click here.
> NAML





--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Best-practices-for-removing-lineage-of-a-RDD-or-Graph-object-tp7779p7893.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Best practices for removing lineage of a RDD or Graph object?

Reply via email to