Well i let it run with flags nodown and it looked like it would finish BUT
it all went wrong somewhere:

This is now the state:

    health: HEALTH_ERR
            nodown flag(s) set
            5602396/94833780 objects misplaced (5.908%)
            Reduced data availability: 143 pgs inactive, 142 pgs peering, 7
pgs stale
            Degraded data redundancy: 248859/94833780 objects degraded
(0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized
            11 stuck requests are blocked > 4096 sec

    pgs:     13.965% pgs not active
             248859/94833780 objects degraded (0.262%)
             5602396/94833780 objects misplaced (5.908%)
             830 active+clean
             75  remapped+peering
             66  peering
             26  active+remapped+backfill_wait
             6   active+undersized+degraded+remapped+backfill_wait
             6   active+recovery_wait+degraded+remapped
             3   active+undersized+degraded+remapped+backfilling
             3   stale+active+undersized+degraded+remapped+backfill_wait
             3   stale+active+remapped+backfill_wait
             2   active+recovery_wait+degraded
             2   active+remapped+backfilling
             1   activating+degraded+remapped
             1   stale+remapped+peering


#ceph health detail shows:

REQUEST_STUCK 11 stuck requests are blocked > 4096 sec
    11 ops are blocked > 16777.2 sec
    osds 4,7,23,24 have stuck requests > 16777.2 sec


So what happened and what should i do now?

Thank you very much for any help

Kind regards,
Caspar


2018-06-07 13:33 GMT+02:00 Sage Weil <s...@newdream.net>:

> On Wed, 6 Jun 2018, Caspar Smit wrote:
> > Hi all,
> >
> > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a
> node
> > to it.
> >
> > osd-max-backfills is at the default 1 so backfilling didn't go very fast
> > but that doesn't matter.
> >
> > Once it started backfilling everything looked ok:
> >
> > ~300 pgs in backfill_wait
> > ~10 pgs backfilling (~number of new osd's)
> >
> > But i noticed the degraded objects increasing a lot. I presume a pg that
> is
> > in backfill_wait state doesn't accept any new writes anymore? Hence
> > increasing the degraded objects?
> >
> > So far so good, but once a while i noticed a random OSD flapping (they
> come
> > back up automatically). This isn't because the disk is saturated but a
> > driver/controller/kernel incompatibility which 'hangs' the disk for a
> short
> > time (scsi abort_task error in syslog). Investigating further i noticed
> > this was already the case before the node expansion.
> >
> > These OSD's flapping results in lots of pg states which are a bit
> worrying:
> >
> >              109 active+remapped+backfill_wait
> >              80  active+undersized+degraded+remapped+backfill_wait
> >              51  active+recovery_wait+degraded+remapped
> >              41  active+recovery_wait+degraded
> >              27  active+recovery_wait+undersized+degraded+remapped
> >              14  active+undersized+remapped+backfill_wait
> >              4   active+undersized+degraded+remapped+backfilling
> >
> > I think the recovery_wait is more important then the backfill_wait, so i
> > like to prioritize these because the recovery_wait was triggered by the
> > flapping OSD's
>
> Just a note: this is fixed in mimic.  Previously, we would choose the
> highest-priority PG to start recovery on at the time, but once recovery
> had started, the appearance of a new PG with a higher priority (e.g.,
> because it finished peering after the others) wouldn't preempt/cancel the
> other PG's recovery, so you would get behavior like the above.
>
> Mimic implements that preemption, so you should not see behavior like
> this.  (If you do, then the function that assigns a priority score to a
> PG needs to be tweaked.)
>
> sage
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to