Hi Hamish,

> NB: On desktop I seem to have a very high number for "SUnreclaim" in
> /proc/meminfo:
>
> MemTotal:       32812004 kB
> MemFree:         8619976 kB
> MemAvailable:    9572924 kB
> Buffers:           61772 kB
> Cached:          1061212 kB
> SwapCached:      1190212 kB
...
> Slab:           16832040 kB
> SReclaimable:     397472 kB
> SUnreclaim:     16434568 kB

I think SUnreclaim is a key number to monitor over time.

> What does SUnreclaimable mean?

Slab is the memory used by the kernel's memory allocator.  Think of it
as malloc(3) but with knowledge of the type of item that may be
allocated.  Some of the slab allocations could be freed if there were
other demands of memory: ‘memory pressure’.  That's SReclaimable.
SUnreclaim is the amount allocated to things which must be kept no
matter how high memory pressure goes.

> This isn't the same on the NAS box, but either way there are two
> problems to debug here and I guess they could be related.

The desktop PC will be easier because you've more tools and more
upstream parties interested in any report.  Does the NAS last the day?
Schedule a nightly reboot, or kexec?
https://wiki.archlinux.org/index.php/Kexec

> Do I have a kernel memory leak?

Probably.

> > ‘sudo slabtop -osc’ will give a breakdown.
...
> Okay, that yields: http://ix.io/2x4T
>
> The total is much smaller than the number in /proc/meminfo (just
> verified it hasn't changed drastically). Bizarre.

That is odd.

Are you using any VM stuff?  Any disk filesystems over than ext4,
e.g. ZFS?  Nvidia graphics drivers?  Is this the machine where a kernel
driver keeps dying?

Monitor SUnreclaim at a regular time period, e.g. 30 seconds, so you can
see it climbing.  You said you were doing a large upload.  If it's the
kind which can recover from being stopped and re-started then see if
your monitoring shows a steady climb during upload which stops if you
kill the upload only to restart when you resume the upload.

> My swap is meant to be for an emergency, not because some leaky code
> in a driver/the kernel/whatever is somehow managing to use 20gb of
> ram.

Swap isn't reserved as an overflow when RAM runs low.  Even when there
is plenty of RAM free, the kernel might decide to swap out some memory
which is not backed by another device because it thinks that memory
would be better used by a cache.

BTW, curl(1) did download something.  ;-)

-- 
Cheers, Ralph.

-- 
  Next meeting: Online, Jitsi, Tuesday, 2020-10-06 20:00
  Check to whom you are replying
  Meetings, mailing list, IRC, ...  http://dorset.lug.org.uk
  New thread, don't hijack:  mailto:dorset@mailman.lug.org.uk

Reply via email to