Hmm, I didn't know we had this functionality before. It looks to be changing quite a lot at the moment, so be aware this will likely require reconfiguring later.
On Sun, May 5, 2019 at 10:40 AM Kyle Brantley <k...@averageurl.com> wrote: > > I've been running luminous / ceph-12.2.11-0.el7.x86_64 on CentOS 7 for about > a month now, and had a few times when I've needed to recreate the OSDs on a > server. (no I'm not planning on routinely doing this...) > > What I've noticed is that the recovery will generally stagger the recovery so > that the pools on the cluster will finish around the same time (+/- a few > hours). What I'm hoping to do is prioritize specific pools over others, so > that ceph will recover all of pool 1 before it moves on to pool 2, for > example. > > In the docs, recovery_{,op}_priority both have roughly the same description, > which is "the priority set for recovery operations" as well as a valid range > of 1-63, default 5. This doesn't tell me if a value of 1 is considered a > higher priority than 63, and it doesn't tell me how it fits in line with > other ceph operations. I'm not seeing this in the luminous docs, are you sure? The source code indicates in Luminous it's 0-254. (As I said, things have changed, so in the current master build it seems to be -10 to 10 and configured a bit differently.) The 1-63 values generally apply to op priorities within the OSD, and are used as part of a weighted priority queue when selecting the next op to work on out of those available; you may have been looking at osd_recovery_op_priority which is on that scale and should apply to individual recovery messages/ops but will not work to schedule PGs differently. > Questions: > 1) If I have pools 1-4, what would I set these values to in order to backfill > pools 1, 2, 3, and then 4 in order? So if I'm reading the code right, they just need to be different weights, and the higher value will win when trying to get a reservation if there's a queue of them. (However, it's possible that lower-priority pools will send off requests first and get to do one or two PGs first, then the higher-priority pool will get to do all its work before that pool continues.) > 2) Assuming this is possible, how do I ensure that backfill isn't prioritized > over client I/O? This is an ongoing issue but I don't think the pool prioritization will change the existing mechanisms. > 3) Is there a command that enumerates the weights of the current operations > (so that I can observe what's going on)? "ceph osd pool ls detail" will include them. > > For context, my pools are: > 1) cephfs_metadata > 2) vm (RBD pool, VM OS drives) > 3) storage (RBD pool, VM data drives) > 4) cephfs_data > > These are sorted by both size (smallest to largest) and criticality of > recovery (most to least). If there's a critique of this setup / a better way > of organizing this, suggestions are welcome. > > Thanks, > --Kyle > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com