How to restrict disk space for spark caches on yarn?

Peter Rudenko Fri, 10 Jul 2015 03:52:41 -0700

Hi, i have a spark ML worklflow. It uses some persist calls. When ilaunch it with 1 tb dataset - it puts down all cluster, becauses itfills all disk space at /yarn/nm/usercache/root/appcache:http://i.imgur.com/qvRUrOp.png


I found a yarn settings:

/yarn/.nodemanager.localizer./cache/.target-size-mb - Target size oflocalizer cache in MB, per nodemanager. It is a target retention sizethat only includes resources with PUBLIC and PRIVATE visibility andexcludes resources with APPLICATION visibility

But it excludes resources with APPLICATION visibility, and spark cacheas i understood is of APPLICATION type.

Is it possible to restrict a disk space for spark application? Willspark fail if it wouldn't be able to persist on disk(StorageLevel.MEMORY_AND_DISK_SER) or it would recompute from data source?


Thanks,
Peter Rudenko

How to restrict disk space for spark caches on yarn?

Reply via email to