On Tue, Oct 15, 2019 at 2:42 AM Jeremi Avenant <jer...@idia.ac.za> wrote:

> Good day
>
> I'm currently administrating a Ceph cluster that consists out of HDDs &
> SSDs. The rule for cephfs_data (ec) is to write to both these drive
> classifications (HDD+SSD). I would like to change it so that
> cephfs_metadata (non-ec) writes to SSD & cephfs_data (erasure encoded "ec")
> writes to HDD since we're experiencing high disk latency.
>
> 1) The first option to come to mind would be to migrate each pool to a new
> rule but this would mean moving a tonne of data around. (How is disk space
> calculated on this, if I use 600 TB in an EC pool, do I need another 600 TB
> pool to move it over, or does it shrink the existing pool as it inflates
> the new pool while moving?)
>
> 2) I would like to know if the alternative is possible:
> i.e. Delete the SSDs from the default host bucket (leave everything as it
> is) and move the metadata pool to the SSD based crush rule.
>
> However I'm not sure if this is possible as it will be deleting a leaf
> from a bucket in our default root. Which means when you add a new SSD osd
> where does it end up?
>
> crush map - http://pastefile.fr/6f37e7e594a61d0edd9dc947349c756b
> ceph osd pool ls detail -
> http://pastefile.fr/0f215e1252ec58c144d9abfe1688adc8
> osd tree - http://pastefile.fr/2acdd377a2db021b6af2996929b85082
>
> If anyone has any input it would be greatly appreciated.
>

What version of Ceph are you running? You may be able to use device classes
instead of munging the CRUSH tree.

Updating the rule to change the destinations will only move data around (it
may be a large data movement) and will only need as much space as PGs in
flight use. For instance if your PG size is 100 GB and an erasure encoding
of 10+2, then each PG takes 10 GB on each OSD. If your osd_max_backfills =
1, then you only need 10 GB of head room on each OSD to make the data
movement. If your osd_max_backfills = 2, then you need 20 GBs as two PGs
may be moved onto the OSD before any PGs may be deleted off of it.

By changing the rule to only use HDD drive class, it will migrate the data
off the SSDs and onto the HDDs (only moving PG shards as needed). Then you
can change the replication rule for the metadata to only use SSD, then it
will migrate the PG replicatas off the HDDs.

Setting the following in /etc/ceph/ceph.conf on the OSDs and restarting the
OSDs before backfilling will reduce the impact of the backfills.

osd op queue = wpq
osd op queue cut off = high

----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to