On 06/29/2016 05:00 PM, Christoph Lameter wrote: > On Wed, 29 Jun 2016, Nikolay Borisov wrote: > >> I've observed a rather strange unbounded growth of the kmalloc-192 >> slab cache: >> >> OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME >> 711124869 411527215 3% 0.19K 16934908 42 135479264K kmalloc-192 >> >> Essentially the kmalloc is around 130 GB , yet only 3 percent of this are >> being used. In this case I'd like to essentially shrink the overall size >> of the cache. How is it possible to achieve that? I tried echoing '1' >> to /sys/kernel/slab/kmalloc-192/shrink but nothing changed. > > Ok this probably means that most slabs have just a few or one objects? > Some workloads can result in situations like that. Can you enable > debugging and get a list of functions where these objects are allocated?
Right, so what debugging concretely do you have in mind. So far what I did was reboot the machine with SLUB merging disabled, since there are quite a lot of slabs being merged into that particular one: :t-0000192 <- cred_jar pid_3 inet_peer_cache request_sock_TCPv6 kmalloc-192 file_lock_cache bio-0 ip_dst_cache key_jar I'm quite sure it's likely it's one of the either networking or bio-0 slab cache, since the others seems generally not very used. > >> This is on 3.12 which is rather old kernel, but still I believe it is >> entirely possible for someone to find a way to flood a machine with >> network requests which would cause a lot of objects to be allocate, >> resulting in a particular slab cache growing, then later when the request >> flood stops the cache would be almost empty, yet the memory won't be usable >> for anything other than satisfying memory allocation from this cache. > > True. Long known problem and all my attempts to facilitate a solution here > did not go anywhere. The essential solution would require objects being > movable or removable from the sparsely allocated page frames. And this > goes way beyond my subsystem. > > If you can figure out which subsystem allocates or frees these objects > (through the call traces) then we may find a knob in the subsystem to > clear those out once in a while. > >