Hmm, I didn't know we had this functionality before. It looks to be
changing quite a lot at the moment, so be aware this will likely
require reconfiguring later.

On Sun, May 5, 2019 at 10:40 AM Kyle Brantley <k...@averageurl.com> wrote:
>
> I've been running luminous / ceph-12.2.11-0.el7.x86_64 on CentOS 7 for about 
> a month now, and had a few times when I've needed to recreate the OSDs on a 
> server. (no I'm not planning on routinely doing this...)
>
> What I've noticed is that the recovery will generally stagger the recovery so 
> that the pools on the cluster will finish around the same time (+/- a few 
> hours). What I'm hoping to do is prioritize specific pools over others, so 
> that ceph will recover all of pool 1 before it moves on to pool 2, for 
> example.
>
> In the docs, recovery_{,op}_priority both have roughly the same description, 
> which is "the priority set for recovery operations" as well as a valid range 
> of 1-63, default 5. This doesn't tell me if a value of 1 is considered a 
> higher priority than 63, and it doesn't tell me how it fits in line with 
> other ceph operations.

I'm not seeing this in the luminous docs, are you sure? The source
code indicates in Luminous it's 0-254. (As I said, things have
changed, so in the current master build it seems to be -10 to 10 and
configured a bit differently.)

The 1-63 values generally apply to op priorities within the OSD, and
are used as part of a weighted priority queue when selecting the next
op to work on out of those available; you may have been looking at
osd_recovery_op_priority which is on that scale and should apply to
individual recovery messages/ops but will not work to schedule PGs
differently.

> Questions:
> 1) If I have pools 1-4, what would I set these values to in order to backfill 
> pools 1, 2, 3, and then 4 in order?

So if I'm reading the code right, they just need to be different
weights, and the higher value will win when trying to get a
reservation if there's a queue of them. (However, it's possible that
lower-priority pools will send off requests first and get to do one or
two PGs first, then the higher-priority pool will get to do all its
work before that pool continues.)

> 2) Assuming this is possible, how do I ensure that backfill isn't prioritized 
> over client I/O?

This is an ongoing issue but I don't think the pool prioritization
will change the existing mechanisms.

> 3) Is there a command that enumerates the weights of the current operations 
> (so that I can observe what's going on)?

"ceph osd pool ls detail" will include them.

>
> For context, my pools are:
> 1) cephfs_metadata
> 2) vm (RBD pool, VM OS drives)
> 3) storage (RBD pool, VM data drives)
> 4) cephfs_data
>
> These are sorted by both size (smallest to largest) and criticality of 
> recovery (most to least). If there's a critique of this setup / a better way 
> of organizing this, suggestions are welcome.
>
> Thanks,
> --Kyle
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to