On Fri, Oct 3, 2014 at 1:01 PM, Skip Montanaro <skip.montan...@gmail.com> wrote: > On Fri, Oct 3, 2014 at 1:36 PM, Croepha <croe...@gmail.com> > wrote: > >> Long running Python jobs that consume a lot of memory while >> running may not return that memory to the operating system >> until the process actually terminates, even if everything is >> garbage collected properly.
> The problem boils down to how the program dynamically allocates > and frees memory, and how the malloc subsystem interacts with > the kernel through the brk and sbrk system calls. (Anywhere I > mention "brk", you can mentally replace it with "sbrk". They do > the same thing - ask for memory from or return memory to the > kernel - using a different set of units, memory addresses or > bytes.) In the general case, programmers call malloc (or > calloc, or realloc) to allocate a chunk of storage from the > heap. (I'm ignoring anything special which Python has layered > on top of malloc. It can mitigate problems, but I don't think > it will fundamentally change the way malloc interacts with the > kernel.) The malloc subsystem maintains a free list (recently > freed bits of memory) from which it can allocate memory without > traipsing off to the kernel. If it can't return a chunk of > memory from the free list, it will (in the most simpleminded > malloc implementation) call brk to grab a new (large) chunk of > memory. The system simply moves the end of the program's > "break", effectively increasing or decreasing the (virtual) size > of the running program. That memory is then doled out to the > user by malloc. If, and only if, every chunk of memory in the > last chunk allocated by a call to brk is placed on malloc's free > list, *and* if the particular malloc implementation on your box > is smart enough to coalesce adjacent chunks of freed memory back > into brk-sized memory chunks, can brk be called once again to > reduce the program's footprint. Actually, ISTR hearing that glibc's malloc+free will use mmap+munmap to allocate and release chunks of memory, to avoid fragmentation. Digging around on the 'net a bit, it appears that glibc's malloc does do this (so on most Linux systems), but only for contiguous chunks of memory above 128K in size. Here's a pair of demonstration programs (one in C, one in CPython 3.4), which when run under strace on a Linux system, appear to show that mmap and munmap are being used: http://stromberg.dnsalias.org/~strombrg/malloc-and-sbrk.html HTH -- https://mail.python.org/mailman/listinfo/python-list