Thanks for that. Seeing 'health err' so frequently has led to worrisome
'alarm fatigue'. Yup that's half of what I want to do.
The number of copies of a pg in the crush map drives how time-critical
and human-intervention critical the pg repair process is. Having
several copies makes automatic pg repair reasonable-- only if there's a
way to log the count of repairs filed against pg's on the same osd since
it was last marked 'in'. I'd love to have looking at that list be a
periodic staffer chore for pro-active osd replacement.
Appreciate the lead for the setting.
On 8/4/19 10:47 AM, Brett Chancellor wrote:
If all you want to do is repair the pg when it finds an inconsistent
pg, you could set osd_scrub_auto_repair to true.
On Sun, Aug 4, 2019, 9:16 AM Harry G. Coin <hgc...@gmail.com
<mailto:hgc...@gmail.com>> wrote:
Question: If you have enough osds it seems an almost daily thing when
you get to work in the morning there' s a "ceph health error" "1 pg
inconsistent" arising from a 'scrub error'. Or 2, etc. Then like
most such mornings you look to see there's two or more valid
instances
of the pg and one with an issue. So, like putting on socks that just
takes time every day: there's the 'ceph pg repair xx' (making note of
the likely soon to fail osd) then hey presto on with the day.
Am I missing some way to automate this and be notified only if one
attempt at pg repair has failed and just a log entry for successful
repairs? Calls about dashboard "HEALTH ERR" warnings so often I
don't
need.
Ideas welcome!
Thanks
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com