Hi all, I seem to be hitting these tracker issues:
https://tracker.ceph.com/issues/23145 http://tracker.ceph.com/issues/24422 PG's 6.1 and 6.3f are having the issues When i list all PG's of a down OSD with: ceph-objectstore-tool --dry-run --type bluestore --data-path /var/lib/ceph/osd/ceph-17/ --op list-pgs There are a lot of 'double' pgid's like (also for other pg's): 6.3fs3 6.3fs5 Is that normal? I would assume different shards for EC would be on seperate OSD's We still have 4 OSD's down and 2 PG's down+remapped and i can't find any way to get the crashed OSD's back up. pg 6.1 is down+remapped, acting [6,3,2147483647,29,2147483647,2147483647] pg 6.3f is down+remapped, acting [20,24,2147483647,2147483647,3,28] Kind regards, Caspar Smit 2018-06-08 8:53 GMT+02:00 Caspar Smit <caspars...@supernas.eu>: > Update: > > I've unset nodown to let it continue but now 4 osd's are down and cannot > be brought up again, here's what the lofgfile reads: > > 2018-06-08 08:35:01.716245 7f4c58de4700 0 log_channel(cluster) log [INF] > : 6.e3s0 continuing backfill to osd.37(4) from (10864'911406,11124'921472] > 6:c7d71bbd:::rbd_data.5.6c1d9574b0dc51.0000000000bf38b9:head to > 11124'921472 > 2018-06-08 08:35:01.727261 7f4c585e3700 -1 > bluestore(/var/lib/ceph/osd/ceph-16) > _txc_add_transaction error (2) No such file or directory not handled on > operation 30 (op 0, counting from 0) > 2018-06-08 08:35:01.727273 7f4c585e3700 -1 > bluestore(/var/lib/ceph/osd/ceph-16) > ENOENT on clone suggests osd bug > > 2018-06-08 08:35:01.730584 7f4c585e3700 -1 /home/builder/source/ceph-12. > 2.2/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_ > transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread > 7f4c585e3700 time 2018-06-08 08:35:01.727379 > /home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363: > FAILED assert(0 == "unexpected error") > > ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x102) [0x558e08ba4202] > 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x15fa) [0x558e08a55c3a] > 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*, > std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction> > >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546) > [0x558e08a572a6] > 4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*, > ObjectStore::Transaction&&, Context*, Context*, Context*, > boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f) > [0x558e085fa37f] > 5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*, > ThreadPool::TPHandle*)+0x6c) [0x558e0857db5c] > 6: (OSD::process_peering_events(std::__cxx11::list<PG*, > std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x558e085abec2] > 7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*, > ThreadPool::TPHandle&)+0x2c) [0x558e0861a91c] > 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x558e08bab3a8] > 9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540] > 10: (()+0x7494) [0x7f4c709ca494] > 11: (clone()+0x3f) [0x7f4c6fa51aff] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > > Any help is highly appreciated. > > Kind regards, > Caspar Smit > > > 2018-06-08 7:57 GMT+02:00 Caspar Smit <caspars...@supernas.eu>: > >> Well i let it run with flags nodown and it looked like it would finish >> BUT it all went wrong somewhere: >> >> This is now the state: >> >> health: HEALTH_ERR >> nodown flag(s) set >> 5602396/94833780 objects misplaced (5.908%) >> Reduced data availability: 143 pgs inactive, 142 pgs peering, >> 7 pgs stale >> Degraded data redundancy: 248859/94833780 objects degraded >> (0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized >> 11 stuck requests are blocked > 4096 sec >> >> pgs: 13.965% pgs not active >> 248859/94833780 objects degraded (0.262%) >> 5602396/94833780 objects misplaced (5.908%) >> 830 active+clean >> 75 remapped+peering >> 66 peering >> 26 active+remapped+backfill_wait >> 6 active+undersized+degraded+remapped+backfill_wait >> 6 active+recovery_wait+degraded+remapped >> 3 active+undersized+degraded+remapped+backfilling >> 3 stale+active+undersized+degraded+remapped+backfill_wait >> 3 stale+active+remapped+backfill_wait >> 2 active+recovery_wait+degraded >> 2 active+remapped+backfilling >> 1 activating+degraded+remapped >> 1 stale+remapped+peering >> >> >> #ceph health detail shows: >> >> REQUEST_STUCK 11 stuck requests are blocked > 4096 sec >> 11 ops are blocked > 16777.2 sec >> osds 4,7,23,24 have stuck requests > 16777.2 sec >> >> >> So what happened and what should i do now? >> >> Thank you very much for any help >> >> Kind regards, >> Caspar >> >> >> 2018-06-07 13:33 GMT+02:00 Sage Weil <s...@newdream.net>: >> >>> On Wed, 6 Jun 2018, Caspar Smit wrote: >>> > Hi all, >>> > >>> > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a >>> node >>> > to it. >>> > >>> > osd-max-backfills is at the default 1 so backfilling didn't go very >>> fast >>> > but that doesn't matter. >>> > >>> > Once it started backfilling everything looked ok: >>> > >>> > ~300 pgs in backfill_wait >>> > ~10 pgs backfilling (~number of new osd's) >>> > >>> > But i noticed the degraded objects increasing a lot. I presume a pg >>> that is >>> > in backfill_wait state doesn't accept any new writes anymore? Hence >>> > increasing the degraded objects? >>> > >>> > So far so good, but once a while i noticed a random OSD flapping (they >>> come >>> > back up automatically). This isn't because the disk is saturated but a >>> > driver/controller/kernel incompatibility which 'hangs' the disk for a >>> short >>> > time (scsi abort_task error in syslog). Investigating further i noticed >>> > this was already the case before the node expansion. >>> > >>> > These OSD's flapping results in lots of pg states which are a bit >>> worrying: >>> > >>> > 109 active+remapped+backfill_wait >>> > 80 active+undersized+degraded+remapped+backfill_wait >>> > 51 active+recovery_wait+degraded+remapped >>> > 41 active+recovery_wait+degraded >>> > 27 active+recovery_wait+undersized+degraded+remapped >>> > 14 active+undersized+remapped+backfill_wait >>> > 4 active+undersized+degraded+remapped+backfilling >>> > >>> > I think the recovery_wait is more important then the backfill_wait, so >>> i >>> > like to prioritize these because the recovery_wait was triggered by the >>> > flapping OSD's >>> >>> Just a note: this is fixed in mimic. Previously, we would choose the >>> highest-priority PG to start recovery on at the time, but once recovery >>> had started, the appearance of a new PG with a higher priority (e.g., >>> because it finished peering after the others) wouldn't preempt/cancel >>> the >>> other PG's recovery, so you would get behavior like the above. >>> >>> Mimic implements that preemption, so you should not see behavior like >>> this. (If you do, then the function that assigns a priority score to a >>> PG needs to be tweaked.) >>> >>> sage >>> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com