Thanks for that.  Seeing 'health err' so frequently has led to worrisome 'alarm fatigue'. Yup that's half of what I want to do.

The number of copies of a pg in the crush map drives how time-critical and human-intervention critical the pg repair process is.  Having several copies makes automatic pg repair reasonable-- only if there's a way to log the count of repairs filed against pg's on the same osd since it was last marked 'in'.    I'd love to have looking at that list be a periodic staffer chore for pro-active osd replacement.

Appreciate the lead for the setting.


On 8/4/19 10:47 AM, Brett Chancellor wrote:
If all you want to do is repair the pg when it finds an inconsistent pg, you could set osd_scrub_auto_repair to true.

On Sun, Aug 4, 2019, 9:16 AM Harry G. Coin <hgc...@gmail.com <mailto:hgc...@gmail.com>> wrote:

    Question: If you have enough osds it seems an almost daily thing when
    you get to work in the morning there' s a "ceph health error" "1 pg
    inconsistent"   arising from a 'scrub error'.   Or 2, etc. Then like
    most such mornings you look to see there's two or more valid
    instances
    of the pg and one with an issue.  So, like putting on socks that just
    takes time every day: there's the 'ceph pg repair xx' (making note of
    the likely soon to fail osd) then hey presto on with the day.

    Am I missing some way to automate this and be notified only if one
    attempt at pg repair has failed and just a log entry for successful
    repairs?   Calls about dashboard "HEALTH ERR" warnings so often I
    don't
    need.

    Ideas welcome!

    Thanks


    _______________________________________________
    ceph-users mailing list
    ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
    http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to