Hi, i have a spark ML worklflow. It uses some persist calls. When i launch it with 1 tb dataset - it puts down all cluster, becauses it fills all disk space at /yarn/nm/usercache/root/appcache: http://i.imgur.com/qvRUrOp.png

I found a yarn settings:
/yarn/.nodemanager.localizer./cache/.target-size-mb - Target size of localizer cache in MB, per nodemanager. It is a target retention size that only includes resources with PUBLIC and PRIVATE visibility and excludes resources with APPLICATION visibility

But it excludes resources with APPLICATION visibility, and spark cache as i understood is of APPLICATION type.

Is it possible to restrict a disk space for spark application? Will spark fail if it wouldn't be able to persist on disk (StorageLevel.MEMORY_AND_DISK_SER) or it would recompute from data source?

Thanks,
Peter Rudenko




Reply via email to