Github user liyezhang556520 commented on the pull request:
https://github.com/apache/spark/pull/2956#issuecomment-60867864
RDD checkpoint should also support like this:
`rdd0 = sc.makeRDD(...)`
`rdd1 = rdd0.flatmap(...)`
`rdd1.collect()`
`rdd0.checkpoint()`
`rdd1.count()` // rdd0 should checkpoint here
Which means rdd checkpoint after action should work on rdds that not call
the actions directly.
This will cause the traverse of the whole rdd lineage until meet the rdds
that has already checkpointed. But the traverse will only check the status of
the rdd, which will not cause rdd's re-computation, so it will only has trivial
impact on performance.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]