[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

liyezhang556520 Tue, 28 Oct 2014 20:14:22 -0700

Github user liyezhang556520 commented on the pull request:

    https://github.com/apache/spark/pull/2956#issuecomment-60867864
  
    RDD checkpoint should also support like this:
    `rdd0 = sc.makeRDD(...)`
    `rdd1 = rdd0.flatmap(...)`
    `rdd1.collect()`
    `rdd0.checkpoint()`
    `rdd1.count()` // rdd0 should checkpoint here
    Which means rdd checkpoint after action should work on rdds that not call 
the actions directly.
    This will cause the traverse of the whole rdd lineage until meet the rdds 
that has already checkpointed. But the traverse will only check the status of 
the rdd, which will not cause rdd's re-computation, so it will only has trivial 
impact on performance.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-4094][CORE] checkpoint should still be ...

Reply via email to