Hello Jacek,

On 20/8/21 2:49 μ.μ., Jacek Laskowski wrote:
Hi,

I've been exploring BlockManager and the stores for a while now and am tempted to say that a memory-only Spark setup would be possible (except shuffle blocks). Is this correct?
Correct.

What about shuffle blocks? Do they have to be stored on disk (in DiskStore)?
Well, by default Spark stores shuffle blocks on disk.

I think broadcast variables are in-memory first so except on-disk storage level explicitly used (by Spark devs), there's no reason not to have Spark in-memory only.

(I was told that one of the differences between Trino/Presto vs Spark SQL is that Trino keeps all processing in-memory only and will blow up while Spark uses disk to avoid OOMEs).

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski <https://about.me/JacekLaskowski>
"The Internals Of" Online Books <https://books.japila.pl/>
Follow me on https://twitter.com/jaceklaskowski <https://twitter.com/jaceklaskowski>

<https://twitter.com/jaceklaskowski>
Best,
Iacovos

Reply via email to