persist() completes immediately since it only marks the RDD for persistence. count() triggers computation of rdd, and as rdd is computed it will be persisted. The following transform should therefore only start after count() and therefore after the persistence completes. I think there might be corner cases where you still see some of rdd computed, like, if a persisted block is lost or otherwise unavailable later.
On Sun, Mar 29, 2015 at 9:07 AM, Harut Martirosyan <[email protected]> wrote: > Hi. > > rdd.persist() > rdd.count() > > rdd.transform()... > > is there a chance transform() runs before persist() is complete? > > -- > RGRDZ Harut --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
