I have found this paper seems to answer most of questions about life
duration.https://www.cs.berkeley.edu/~matei/papers/2012/hotcloud_spark_streaming.pdf
Tian
On Tuesday, November 25, 2014 4:02 AM, Mukesh Jha
<[email protected]> wrote:
Hey Experts,
I wanted to understand in detail about the lifecycle of rdd(s) in a streaming
app.
>From my current understanding- rdd gets created out of the realtime input
>stream.
- Transform(s) functions are applied in a lazy fashion on the RDD to transform
into another rdd(s).- Actions are taken on the final transformed rdds to get
the data out of the system.
Also rdd(s) are stored in the clusters RAM (disc if configured so) and are
cleaned in LRU fashion.
So I have the following questions on the same.
- How spark (streaming) guarantees that all the actions are taken on each input
rdd/batch. - How does spark determines that the life-cycle of a rdd is
complete. Is there any chance that a RDD will be cleaned out of ram before all
actions are taken on them?
Thanks in advance for all your help. Also, I'm relatively new to scala & spark
so pardon me in case these are naive questions/assumptions.
--
Thanks & Regards,
Mukesh Jha