If you want to ensure the persisted RDD has been calculated first,
just run foreach with a dummy function first to force evaluation.

--
Michael Mior
michael.m...@gmail.com

Le jeu. 24 sept. 2020 à 00:38, Arya Ketan <ketan.a...@gmail.com> a écrit :
>
> Thanks, we were able to validate the same behaviour.
>
> On Wed, 23 Sep 2020 at 18:05, Sean Owen <sro...@gmail.com> wrote:
>>
>> It is but it happens asynchronously. If you access the same block twice 
>> quickly, the cached block may not yet be available the second time yet.
>>
>> On Wed, Sep 23, 2020, 7:17 AM Arya Ketan <ketan.a...@gmail.com> wrote:
>>>
>>> Hi,
>>> I have a spark streaming use-case ( spark 2.2.1 ). And in my spark job, I 
>>> have multiple actions. I am running them in parallel by executing the 
>>> actions in separate threads. I have  a rdd.persist after which the DAG 
>>> forks into multiple actions.
>>> but I see that rdd caching is not happening  and the entire DAG is executed 
>>> twice ( once in each action) .
>>>
>>> What am I missing?
>>> Arya
>>>
>>>
>>
>>
> --
> Arya

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to