Re: Spark RDD and Memory

Aditya Thu, 22 Sep 2016 21:54:57 -0700

Thanks for the reply.

One more question.

How spark handles data if it does not fit in memory? The answer which Igot is that it flushes the data to disk and handle the memory issue.

Plus in below example.
val textFile = sc.textFile("/user/emp.txt")
val textFile1 = sc.textFile("/user/emp1.xt")
val join = textFile.join(textFile1)
join.saveAsTextFile("/home/output")
val count = join.count()

When the first action is performed it loads textFile and textFile1 inmemory, performes join and save the result.But when the second action (count) is called, it again loads textFileand textFile1 in memory and again performs the join operation?If it loads again what is the correct way to prevent it from loadingagain again the same data?


On Thursday 22 September 2016 11:12 PM, Mich Talebzadeh wrote:

Hi,
unpersist works on storage memory not execution memory. So I do notthink you can flush it out of memory if you have not cached it usingcache or something like below in the first place.
s.persist(org.apache.spark.storage.StorageLevel.MEMORY_ONLY)

s.unpersist
I believe the recent versions of Spark deploy Least Recently Used(LRU) mechanism to flush unused data out of memory much like RBMScache management. I know LLDAP does that.
HTH



Dr Mich Talebzadeh
LinkedIn/https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw/
http://talebzadehmich.wordpress.com
*Disclaimer:* Use it at your own risk.Any and all responsibility forany loss, damage or destruction of data or any other property whichmay arise from relying on this email's technical content is explicitlydisclaimed. The author will in no case be liable for any monetarydamages arising from such loss, damage or destruction.
On 22 September 2016 at 18:09, Hanumath Rao Maduri <hanu....@gmail.com<mailto:hanu....@gmail.com>> wrote:
    Hello Aditya,

    After an intermediate action has been applied you might want to
    call rdd.unpersist() to let spark know that this rdd is no longer
    required.

    Thanks,
    -Hanu

    On Thu, Sep 22, 2016 at 7:54 AM, Aditya
    <aditya.calangut...@augmentiq.co.in
    <mailto:aditya.calangut...@augmentiq.co.in>> wrote:

        Hi,

        Suppose I have two RDDs
        val textFile = sc.textFile("/user/emp.txt")
        val textFile1 = sc.textFile("/user/emp1.xt")

        Later I perform a join operation on above two RDDs
        val join = textFile.join(textFile1)

        And there are subsequent transformations without including
        textFile and textFile1 further and an action to start the
        execution.

        When action is called, textFile and textFile1 will be loaded
        in memory first. Later join will be performed and kept in memory.
        My question is once join is there memory and is used for
        subsequent execution, what happens to textFile and textFile1
        RDDs. Are they still kept in memory untill the full lineage
        graph is completed or is it destroyed once its use is over? If
        it is kept in memory, is there any way I can explicitly remove
        it from memory to free the memory?





        ---------------------------------------------------------------------
        To unsubscribe e-mail: user-unsubscr...@spark.apache.org
        <mailto:user-unsubscr...@spark.apache.org>

Re: Spark RDD and Memory

Reply via email to