Well I don't know what having an "in-memory Spark only" is going to
achieve. Spark GUI shows the amount of disk usage pretty well. The memory
is used exclusively by default first.

Spark is no different from a predominantly in-memory application.
Effectively it is doing the classical disk based hadoop  map-reduce
operation "in memory" to speed up the processing but it is still an
application on top of the OS.  So like mose applications, there is a state
of Spark, the code running and the OS(s), where disk usage will be needed.

This is akin to swap space on OS itself and I quote "Swap space is used when
your operating system decides that it needs physical memory for active
processes and the amount of available (unused) physical memory is
insufficient. When this happens, inactive pages from the physical memory
are then moved into the swap space, freeing up that physical memory for
other uses"

 free
              total        used        free      shared  buff/cache
 available
Mem:       65659732    30116700     1429436     2341772    34113596
32665372
Swap:     104857596      550912   104306684

HTH


   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 20 Aug 2021 at 12:50, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> I've been exploring BlockManager and the stores for a while now and am
> tempted to say that a memory-only Spark setup would be possible (except
> shuffle blocks). Is this correct?
>
> What about shuffle blocks? Do they have to be stored on disk (in
> DiskStore)?
>
> I think broadcast variables are in-memory first so except on-disk storage
> level explicitly used (by Spark devs), there's no reason not to have Spark
> in-memory only.
>
> (I was told that one of the differences between Trino/Presto vs Spark SQL
> is that Trino keeps all processing in-memory only and will blow up while
> Spark uses disk to avoid OOMEs).
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://about.me/JacekLaskowski
> "The Internals Of" Online Books <https://books.japila.pl/>
> Follow me on https://twitter.com/jaceklaskowski
>
> <https://twitter.com/jaceklaskowski>
>

Reply via email to