Yarn only has the ability to kill not checkpoint or sig suspend. If you use too much memory it will simply kill tasks based upon the yarn config. https://issues.apache.org/jira/browse/YARN-2172
On Friday, January 23, 2015, Sandy Ryza <sandy.r...@cloudera.com> wrote: > Hi Sven, > > What version of Spark are you running? Recent versions have a change that > allows PySpark to share a pool of processes instead of starting a new one > for each task. > > -Sandy > > On Fri, Jan 23, 2015 at 9:36 AM, Sven Krasser <kras...@gmail.com > <javascript:_e(%7B%7D,'cvml','kras...@gmail.com');>> wrote: > >> Hey all, >> >> I am running into a problem where YARN kills containers for being over >> their memory allocation (which is about 8G for executors plus 6G for >> overhead), and I noticed that in those containers there are tons of >> pyspark.daemon processes hogging memory. Here's a snippet from a container >> with 97 pyspark.daemon processes. The total sum of RSS usage across all of >> these is 1,764,956 pages (i.e. 6.7GB on the system). >> >> Any ideas what's happening here and how I can get the number of >> pyspark.daemon processes back to a more reasonable count? >> >> 2015-01-23 15:36:53,654 INFO [Reporter] yarn.YarnAllocationHandler >> (Logging.scala:logInfo(59)) - Container marked as failed: >> container_1421692415636_0052_01_000030. Exit status: 143. Diagnostics: >> Container [pid=35211,containerID=container_1421692415636_0052_01_000030] is >> running beyond physical memory limits. Current usage: 14.9 GB of 14.5 GB >> physical memory used; 41.3 GB of 72.5 GB virtual memory used. Killing >> container. >> Dump of the process-tree for container_1421692415636_0052_01_000030 : >> |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) >> SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE >> |- 54101 36625 36625 35211 (python) 78 1 332730368 16834 python -m >> pyspark.daemon >> |- 52140 36625 36625 35211 (python) 58 1 332730368 16837 python -m >> pyspark.daemon >> |- 36625 35228 36625 35211 (python) 65 604 331685888 17694 python -m >> pyspark.daemon >> >> [...] >> >> >> Full output here: https://gist.github.com/skrasser/e3e2ee8dede5ef6b082c >> >> Thank you! >> -Sven >> >> -- >> krasser <http://sites.google.com/site/krasser/?utm_source=sig> >> > >