Good data point on not trimming when non active+clean PGs are present. So
am I reading this correct? It grew to 32GB? Did it end up growing beyond
that, what was the max? Also is only ~18PGs per OSD a reasonable amount of
PGs per OSD? I would think about quadruple that would be ideal. Is this an
artifact of a steadily growing cluster or a design choice?

On Sat, Feb 3, 2018 at 10:50 AM, Wido den Hollander <w...@42on.com> wrote:

> Hi,
>
> I just wanted to inform people about the fact that Monitor databases can
> grow quite big when you have a large cluster which is performing a very
> long rebalance.
>
> I'm posting this on ceph-users and ceph-large as it applies to both, but
> you'll see this sooner on a cluster with a lof of OSDs.
>
> Some information:
>
> - Version: Luminous 12.2.2
> - Number of OSDs: 2175
> - Data used: ~2PB
>
> We are in the middle of migrating from FileStore to BlueStore and this is
> causing a lot of PGs to backfill at the moment:
>
>              33488 active+clean
>              4802  active+undersized+degraded+remapped+backfill_wait
>              1670  active+remapped+backfill_wait
>              263   active+undersized+degraded+remapped+backfilling
>              250   active+recovery_wait+degraded
>              54    active+recovery_wait+degraded+remapped
>              27    active+remapped+backfilling
>              13    active+recovery_wait+undersized+degraded+remapped
>              2     active+recovering+degraded
>
> This has been running for a few days now and it has caused this warning:
>
> MON_DISK_BIG mons srv-zmb03-05,srv-zmb04-05,srv-
> zmb05-05,srv-zmb06-05,srv-zmb07-05 are using a lot of disk space
>     mon.srv-zmb03-05 is 31666 MB >= mon_data_size_warn (15360 MB)
>     mon.srv-zmb04-05 is 31670 MB >= mon_data_size_warn (15360 MB)
>     mon.srv-zmb05-05 is 31670 MB >= mon_data_size_warn (15360 MB)
>     mon.srv-zmb06-05 is 31897 MB >= mon_data_size_warn (15360 MB)
>     mon.srv-zmb07-05 is 31891 MB >= mon_data_size_warn (15360 MB)
>
> This is to be expected as MONs do not trim their store if one or more PGs
> is not active+clean.
>
> In this case we expected this and the MONs are each running on a 1TB Intel
> DC-series SSD to make sure we do not run out of space before the backfill
> finishes.
>
> The cluster is spread out over racks and in CRUSH we replicate over racks.
> Rack by rack we are wiping/destroying the OSDs and bringing them back as
> BlueStore OSDs and letting the backfill handle everything.
>
> In between we wait for the cluster to become HEALTH_OK (all PGs
> active+clean) so that the Monitors can trim their database before we start
> with the next rack.
>
> I just want to warn and inform people about this. Under normal
> circumstances a MON database isn't that big, but if you have a very long
> period of backfills/recoveries and also have a large number of OSDs you'll
> see the DB grow quite big.
>
> This has improved significantly going to Jewel and Luminous, but it is
> still something to watch out for.
>
> Make sure your MONs have enough free space to handle this!
>
> Wido
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>



-- 
Respectfully,

Wes Dillingham
wes_dilling...@harvard.edu
Research Computing | Senior CyberInfrastructure Storage Engineer
Harvard University | 38 Oxford Street, Cambridge, Ma 02138 | Room 204
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to