Hello Jacek,
On 20/8/21 2:49 μ.μ., Jacek Laskowski wrote:
Hi,
I've been exploring BlockManager and the stores for a while now and am
tempted to say that a memory-only Spark setup would be possible
(except shuffle blocks). Is this correct?
Correct.
What about shuffle blocks? Do they have to be stored on disk (in
DiskStore)?
Well, by default Spark stores shuffle blocks on disk.
I think broadcast variables are in-memory first so except on-disk
storage level explicitly used (by Spark devs), there's no reason not
to have Spark in-memory only.
(I was told that one of the differences between Trino/Presto vs Spark
SQL is that Trino keeps all processing in-memory only and will blow up
while Spark uses disk to avoid OOMEs).
Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski <https://about.me/JacekLaskowski>
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski
<https://twitter.com/jaceklaskowski>
<https://twitter.com/jaceklaskowski>
Best,
Iacovos