Thanks very much for the responses and guidance. Just for some belated closure
regarding this (and for the archives), we gradually decremented the
mds_cache_memory_limit by a few hundred MBs at a time while monitoring and
everything was fine.
New mds_cache_memory_limit is 318208819200 (319GB).
On Wed, May 27, 2020 at 10:09 PM Dylan McCulloch wrote:
>
> Hi all,
>
> The single active MDS on one of our Ceph clusters is close to running out of
> RAM.
>
> MDS total system RAM = 528GB
> MDS current free system RAM = 4GB
> mds_cache_memory_limit = 451GB
> current mds cache usage = 426GB
This
Hi Dylan,
It looks like you have 10GB of heap to be release -- try `ceph tell
mds.$(hostname) heap release` to free that up.
Otherwise, I've found it safe to incrementally inject decreased
mds_cache_memory_limit's on prod mds's running v12.2.12. I'd start by
decreasing the size just a few hundred