Update: I've unset nodown to let it continue but now 4 osd's are down and cannot be brought up again, here's what the lofgfile reads:
2018-06-08 08:35:01.716245 7f4c58de4700 0 log_channel(cluster) log [INF] : 6.e3s0 continuing backfill to osd.37(4) from (10864'911406,11124'921472] 6:c7d71bbd:::rbd_data.5.6c1d9574b0dc51.0000000000bf38b9:head to 11124'921472 2018-06-08 08:35:01.727261 7f4c585e3700 -1 bluestore(/var/lib/ceph/osd/ceph-16) _txc_add_transaction error (2) No such file or directory not handled on operation 30 (op 0, counting from 0) 2018-06-08 08:35:01.727273 7f4c585e3700 -1 bluestore(/var/lib/ceph/osd/ceph-16) ENOENT on clone suggests osd bug 2018-06-08 08:35:01.730584 7f4c585e3700 -1 /home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread 7f4c585e3700 time 2018-06-08 08:35:01.727379 /home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363: FAILED assert(0 == "unexpected error") ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x558e08ba4202] 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, ObjectStore::Transaction*)+0x15fa) [0x558e08a55c3a] 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) [0x558e08a572a6] 4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, ObjectStore::Transaction&&, Context*, Context*, Context*, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) [0x558e085fa37f] 5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, ThreadPool::TPHandle*)+0x6c) [0x558e0857db5c] 6: (OSD::process_peering_events(std::__cxx11::list<PG*, std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x558e085abec2] 7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, ThreadPool::TPHandle&)+0x2c) [0x558e0861a91c] 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x558e08bab3a8] 9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540] 10: (()+0x7494) [0x7f4c709ca494] 11: (clone()+0x3f) [0x7f4c6fa51aff] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Any help is highly appreciated. Kind regards, Caspar Smit 2018-06-08 7:57 GMT+02:00 Caspar Smit <caspars...@supernas.eu>: > Well i let it run with flags nodown and it looked like it would finish BUT > it all went wrong somewhere: > > This is now the state: > > health: HEALTH_ERR > nodown flag(s) set > 5602396/94833780 objects misplaced (5.908%) > Reduced data availability: 143 pgs inactive, 142 pgs peering, > 7 pgs stale > Degraded data redundancy: 248859/94833780 objects degraded > (0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized > 11 stuck requests are blocked > 4096 sec > > pgs: 13.965% pgs not active > 248859/94833780 objects degraded (0.262%) > 5602396/94833780 objects misplaced (5.908%) > 830 active+clean > 75 remapped+peering > 66 peering > 26 active+remapped+backfill_wait > 6 active+undersized+degraded+remapped+backfill_wait > 6 active+recovery_wait+degraded+remapped > 3 active+undersized+degraded+remapped+backfilling > 3 stale+active+undersized+degraded+remapped+backfill_wait > 3 stale+active+remapped+backfill_wait > 2 active+recovery_wait+degraded > 2 active+remapped+backfilling > 1 activating+degraded+remapped > 1 stale+remapped+peering > > > #ceph health detail shows: > > REQUEST_STUCK 11 stuck requests are blocked > 4096 sec > 11 ops are blocked > 16777.2 sec > osds 4,7,23,24 have stuck requests > 16777.2 sec > > > So what happened and what should i do now? > > Thank you very much for any help > > Kind regards, > Caspar > > > 2018-06-07 13:33 GMT+02:00 Sage Weil <s...@newdream.net>: > >> On Wed, 6 Jun 2018, Caspar Smit wrote: >> > Hi all, >> > >> > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a >> node >> > to it. >> > >> > osd-max-backfills is at the default 1 so backfilling didn't go very fast >> > but that doesn't matter. >> > >> > Once it started backfilling everything looked ok: >> > >> > ~300 pgs in backfill_wait >> > ~10 pgs backfilling (~number of new osd's) >> > >> > But i noticed the degraded objects increasing a lot. I presume a pg >> that is >> > in backfill_wait state doesn't accept any new writes anymore? Hence >> > increasing the degraded objects? >> > >> > So far so good, but once a while i noticed a random OSD flapping (they >> come >> > back up automatically). This isn't because the disk is saturated but a >> > driver/controller/kernel incompatibility which 'hangs' the disk for a >> short >> > time (scsi abort_task error in syslog). Investigating further i noticed >> > this was already the case before the node expansion. >> > >> > These OSD's flapping results in lots of pg states which are a bit >> worrying: >> > >> > 109 active+remapped+backfill_wait >> > 80 active+undersized+degraded+remapped+backfill_wait >> > 51 active+recovery_wait+degraded+remapped >> > 41 active+recovery_wait+degraded >> > 27 active+recovery_wait+undersized+degraded+remapped >> > 14 active+undersized+remapped+backfill_wait >> > 4 active+undersized+degraded+remapped+backfilling >> > >> > I think the recovery_wait is more important then the backfill_wait, so i >> > like to prioritize these because the recovery_wait was triggered by the >> > flapping OSD's >> >> Just a note: this is fixed in mimic. Previously, we would choose the >> highest-priority PG to start recovery on at the time, but once recovery >> had started, the appearance of a new PG with a higher priority (e.g., >> because it finished peering after the others) wouldn't preempt/cancel the >> other PG's recovery, so you would get behavior like the above. >> >> Mimic implements that preemption, so you should not see behavior like >> this. (If you do, then the function that assigns a priority score to a >> PG needs to be tweaked.) >> >> sage >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com