Well i let it run with flags nodown and it looked like it would finish BUT it all went wrong somewhere:
This is now the state: health: HEALTH_ERR nodown flag(s) set 5602396/94833780 objects misplaced (5.908%) Reduced data availability: 143 pgs inactive, 142 pgs peering, 7 pgs stale Degraded data redundancy: 248859/94833780 objects degraded (0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized 11 stuck requests are blocked > 4096 sec pgs: 13.965% pgs not active 248859/94833780 objects degraded (0.262%) 5602396/94833780 objects misplaced (5.908%) 830 active+clean 75 remapped+peering 66 peering 26 active+remapped+backfill_wait 6 active+undersized+degraded+remapped+backfill_wait 6 active+recovery_wait+degraded+remapped 3 active+undersized+degraded+remapped+backfilling 3 stale+active+undersized+degraded+remapped+backfill_wait 3 stale+active+remapped+backfill_wait 2 active+recovery_wait+degraded 2 active+remapped+backfilling 1 activating+degraded+remapped 1 stale+remapped+peering #ceph health detail shows: REQUEST_STUCK 11 stuck requests are blocked > 4096 sec 11 ops are blocked > 16777.2 sec osds 4,7,23,24 have stuck requests > 16777.2 sec So what happened and what should i do now? Thank you very much for any help Kind regards, Caspar 2018-06-07 13:33 GMT+02:00 Sage Weil <s...@newdream.net>: > On Wed, 6 Jun 2018, Caspar Smit wrote: > > Hi all, > > > > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a > node > > to it. > > > > osd-max-backfills is at the default 1 so backfilling didn't go very fast > > but that doesn't matter. > > > > Once it started backfilling everything looked ok: > > > > ~300 pgs in backfill_wait > > ~10 pgs backfilling (~number of new osd's) > > > > But i noticed the degraded objects increasing a lot. I presume a pg that > is > > in backfill_wait state doesn't accept any new writes anymore? Hence > > increasing the degraded objects? > > > > So far so good, but once a while i noticed a random OSD flapping (they > come > > back up automatically). This isn't because the disk is saturated but a > > driver/controller/kernel incompatibility which 'hangs' the disk for a > short > > time (scsi abort_task error in syslog). Investigating further i noticed > > this was already the case before the node expansion. > > > > These OSD's flapping results in lots of pg states which are a bit > worrying: > > > > 109 active+remapped+backfill_wait > > 80 active+undersized+degraded+remapped+backfill_wait > > 51 active+recovery_wait+degraded+remapped > > 41 active+recovery_wait+degraded > > 27 active+recovery_wait+undersized+degraded+remapped > > 14 active+undersized+remapped+backfill_wait > > 4 active+undersized+degraded+remapped+backfilling > > > > I think the recovery_wait is more important then the backfill_wait, so i > > like to prioritize these because the recovery_wait was triggered by the > > flapping OSD's > > Just a note: this is fixed in mimic. Previously, we would choose the > highest-priority PG to start recovery on at the time, but once recovery > had started, the appearance of a new PG with a higher priority (e.g., > because it finished peering after the others) wouldn't preempt/cancel the > other PG's recovery, so you would get behavior like the above. > > Mimic implements that preemption, so you should not see behavior like > this. (If you do, then the function that assigns a priority score to a > PG needs to be tweaked.) > > sage >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com