Hi, i have a spark ML worklflow. It uses some persist calls. When i
launch it with 1 tb dataset - it puts down all cluster, becauses it
fills all disk space at /yarn/nm/usercache/root/appcache:
http://i.imgur.com/qvRUrOp.png
I found a yarn settings:
/yarn/.nodemanager.localizer./cache/.target-size-mb - Target size of
localizer cache in MB, per nodemanager. It is a target retention size
that only includes resources with PUBLIC and PRIVATE visibility and
excludes resources with APPLICATION visibility
But it excludes resources with APPLICATION visibility, and spark cache
as i understood is of APPLICATION type.
Is it possible to restrict a disk space for spark application? Will
spark fail if it wouldn't be able to persist on disk
(StorageLevel.MEMORY_AND_DISK_SER) or it would recompute from data source?
Thanks,
Peter Rudenko
- How to restrict disk space for spark caches on yarn? Peter Rudenko
-