If you want to ensure the persisted RDD has been calculated first, just run foreach with a dummy function first to force evaluation.
-- Michael Mior michael.m...@gmail.com Le jeu. 24 sept. 2020 à 00:38, Arya Ketan <ketan.a...@gmail.com> a écrit : > > Thanks, we were able to validate the same behaviour. > > On Wed, 23 Sep 2020 at 18:05, Sean Owen <sro...@gmail.com> wrote: >> >> It is but it happens asynchronously. If you access the same block twice >> quickly, the cached block may not yet be available the second time yet. >> >> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <ketan.a...@gmail.com> wrote: >>> >>> Hi, >>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I >>> have multiple actions. I am running them in parallel by executing the >>> actions in separate threads. I have a rdd.persist after which the DAG >>> forks into multiple actions. >>> but I see that rdd caching is not happening and the entire DAG is executed >>> twice ( once in each action) . >>> >>> What am I missing? >>> Arya >>> >>> >> >> > -- > Arya --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org