Hello all,

I'm maintaining a small Nautilus 12 OSD cluster (36TB raw). My mon nodes have 
the mgr/mds collocated/stacked with the mon. Each are allocated 10gb of RAM.

During a recent single disk failure and corresponding recovery, I noticed my 
mgr/mon's were starting to get OOM killed/restarted every 5ish hours - the mgr 
using around 6.5GB on all my nodes. My monitoring shows an interesting sawtooth 
pattern with network usage (100MB/s at max), disk storage usage, and disk IO 
(up to 300MB/s against SSD's at max) usage increasing in parallel with memory 
usage.

I know the docs for hardware recommendations say:
> Monitor and manager daemon memory usage generally scales with the size of the 
> cluster. For small clusters, 1-2 GB is generally sufficient. For large 
> clusters, you should provide more (5-10 GB).

Now, I would like to think my cluster is on the small size of things, so I was 
hoping 10gb is enough for the mgr and mon (my OSD nodes are only allocated 32GB 
of ram), but that assumption appears to be false.

So I was wondering how mgr's (and to a lesser extent mon's) are expected to 
scale in terms of memory. Is it the osd count, or the osd's size, number of 
pg's, etc.? And if there's a way to limit the amount of RAM used by these mgr's 
(it seems the mon_osd_cache_size and rocksdb_cache_size settings are for mons 
if I'm not mistaken).

Regards,
Mark
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to