Re: Lifecycle of RDD in spark-streaming

Harihar Nahak Thu, 27 Nov 2014 14:53:14 -0800

When there is new data comes in a stream spark use streams classes to
convert it into RDD and as you mention its follow with transformation and
finally action. Till the time user doesn't destroy or application is alive
All RDD remain in Memory as far as I experienced.



On 26 November 2014 at 20:05, Mukesh Jha [via Apache Spark User List] <
ml-node+s1001560n19835...@n3.nabble.com> wrote:

> Any pointers guys?
>
> On Tue, Nov 25, 2014 at 5:32 PM, Mukesh Jha <[hidden email]
> <http://user/SendEmail.jtp?type=node&node=19835&i=0>> wrote:
>
>> Hey Experts,
>>
>> I wanted to understand in detail about the lifecycle of rdd(s) in a
>> streaming app.
>>
>> From my current understanding
>> - rdd gets created out of the realtime input stream.
>> - Transform(s) functions are applied in a lazy fashion on the RDD to
>> transform into another rdd(s).
>> - Actions are taken on the final transformed rdds to get the data out of
>> the system.
>>
>> Also rdd(s) are stored in the clusters RAM (disc if configured so) and
>> are cleaned in LRU fashion.
>>
>> So I have the following questions on the same.
>> - How spark (streaming) guarantees that all the actions are taken on each
>> input rdd/batch.
>> - How does spark determines that the life-cycle of a rdd is complete. Is
>> there any chance that a RDD will be cleaned out of ram before all actions
>> are taken on them?
>>
>> Thanks in advance for all your help. Also, I'm relatively new to scala &
>> spark so pardon me in case these are naive questions/assumptions.
>>
>> --
>> Thanks & Regards,
>>
>> *[hidden email] <http://user/SendEmail.jtp?type=node&node=19835&i=1>*
>>
>
>
>
> --
>
>
> Thanks & Regards,
>
> *[hidden email] <http://user/SendEmail.jtp?type=node&node=19835&i=2>*
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Lifecycle-of-RDD-in-spark-streaming-tp19749p19835.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1...@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=aG5haGFrQHd5bnlhcmRncm91cC5jb218MXwtMTgxOTE5MTkyOQ==>
> .
> NAML
> <http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>



-- 
Regards,
Harihar Nahak
BigData Developer
Wynyard
Email:hna...@wynyardgroup.com | Extn: 8019




-----
--Harihar
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Lifecycle-of-RDD-in-spark-streaming-tp19749p19987.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Lifecycle of RDD in spark-streaming

Reply via email to