I'm running spark locally on my laptop to explore how persistence impacts
memory use. I'm generating 80 MB matrices in numpy and then simply adding
them as an example problem.
No matter what I set NUM or persistence level to in the code below, I get
out of memory errors like (
https://gist.githu
ny way to
by-default disk-back them (something analogous to mmap?) so that they don't
create memory pressure in the system at all? With compute taking this long,
the added overhead of disk and network IO is quite minimal.
Thanks!
...Eric Jonas