Hi,

I've been exploring BlockManager and the stores for a while now and am
tempted to say that a memory-only Spark setup would be possible (except
shuffle blocks). Is this correct?

What about shuffle blocks? Do they have to be stored on disk (in DiskStore)?

I think broadcast variables are in-memory first so except on-disk storage
level explicitly used (by Spark devs), there's no reason not to have Spark
in-memory only.

(I was told that one of the differences between Trino/Presto vs Spark SQL
is that Trino keeps all processing in-memory only and will blow up while
Spark uses disk to avoid OOMEs).

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski

<https://twitter.com/jaceklaskowski>

Reply via email to