Re: Checkpointed RDD still causing StackOverflow

2014-06-24 Thread dash
Due to SPARK-2245, you can not use count to materialize VertexRDD. That actually materialize PartitionRDD, so checkpoint for VertexRDD won't work. I'll trying to fix that right now. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Checkpointed-RDD-still

Re: Checkpointed RDD still causing StackOverflow

2014-06-24 Thread Mayur Rustagi
Do not call collect as that will perform materialization as well as transfer of data to driver (might actually cause driver to fail if the data is huge). You have to materialize the RDD in some way(call save, count, collect). Mayur Rustagi Ph: +1 (760) 203 3257 http://www.sigmoidanalytics.com @may

Re: Checkpointed RDD still causing StackOverflow

2014-06-23 Thread Xiangrui Meng
Calling checkpoint() alone doesn't cut the lineage. It only marks the RDD as to be checkpointed. The lineage is cut after the first time this RDD is materialized. You see StackOverflow becaure the lineage is still there. -Xiangrui On Sun, Jun 22, 2014 at 6:37 PM, dash wrote: > Hi Xiangrui, > > Ac

Re: Checkpointed RDD still causing StackOverflow

2014-06-22 Thread dash
Hi Xiangrui, According to my knowledge, calling count is for materialize the RDD, does collect do the same thing since it also an action? I can not call count because for a Graph object, count does not materialize the RDD. I already send an issue on that. My question is, why there still have stac

Re: Checkpointed RDD still causing StackOverflow

2014-06-22 Thread Xiangrui Meng
After checkpoint(), please call count(). This is similar to cache(), the RDD is only marked as to be checked with checkpoint(). -Xiangrui On Sun, Jun 22, 2014 at 3:14 PM, dash wrote: > Hi, > > I'm doing iterative computing now, and due to lineage chain, we need to > checkpoint the RDD in order to