All, Whenever we're doing some kind of recovery operation on our ceph clusters (cluster expansion or dealing with a drive failure), there seems to be a fairly noticable performance drop while it does the backfills (last time I measured it the performance during recovery was something like 20% of a healthy cluster). I'm wondering if there are any settings that we might be missing which would improve this situation?
Before doing any kind of expansion operation I make sure both 'noscrub' and 'nodeep-scrub' are set to make sure scrubing isn't making things worse. Also we have the following options set in our ceph.conf: [osd] osd_journal_size = 16384 osd_max_backfills = 1 osd_recovery_max_active = 1 osd_recovery_op_priority = 1 osd_recovery_max_single_start = 1 osd_op_threads = 12 osd_crush_initial_weight = 0 I'm wondering if there might be a way to use ionice in the CFQ scheduler to delegate the recovery traffic to be of the Idle type so customer traffic has a higher priority? Thanks, Bryan ________________________________ This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com