Re: RDD Persistance synchronization

Harut Martirosyan Sun, 29 Mar 2015 02:03:45 -0700

Thanks to you again, Sean.

The thing is that, we persist and count that RDD in hope that all later
actions with it won't trigger previous recalculations, it's not really
about performance here, it's because recalculations contain UUID generation
which should be the same for further actions.


I understand that RDD concept is based on linage, and it kind of
contradicts our goal but, is there ay way to guarantee that it's persisted,
or make it fail when persisting fails?

On 29 March 2015 at 12:51, Sean Owen <[email protected]> wrote:

> persist() completes immediately since it only marks the RDD for
> persistence. count() triggers computation of rdd, and as rdd is
> computed it will be persisted. The following transform should
> therefore only start after count() and therefore after the persistence
> completes. I think there might be corner cases where you still see
> some of rdd computed, like, if a persisted block is lost or otherwise
> unavailable later.
>
> On Sun, Mar 29, 2015 at 9:07 AM, Harut Martirosyan
> <[email protected]> wrote:
> > Hi.
> >
> > rdd.persist()
> > rdd.count()
> >
> > rdd.transform()...
> >
> > is there a chance transform() runs before persist() is complete?
> >
> > --
> > RGRDZ Harut
>



-- 
RGRDZ Harut

Re: RDD Persistance synchronization

Reply via email to