Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

Caspar Smit Fri, 08 Jun 2018 07:39:18 -0700

Hi all,

I seem to be hitting these tracker issues:


https://tracker.ceph.com/issues/23145
http://tracker.ceph.com/issues/24422

PG's 6.1 and 6.3f are having the issues

When i list all PG's of a down OSD with:

ceph-objectstore-tool --dry-run --type bluestore --data-path
/var/lib/ceph/osd/ceph-17/ --op list-pgs

There are a lot of 'double' pgid's like (also for other pg's):

6.3fs3
6.3fs5

Is that normal? I would assume different shards for EC would be on seperate
OSD's

We still have 4 OSD's down and 2 PG's down+remapped and i can't find any
way to get the crashed OSD's back up.

    pg 6.1 is down+remapped, acting
[6,3,2147483647,29,2147483647,2147483647]
    pg 6.3f is down+remapped, acting [20,24,2147483647,2147483647,3,28]

Kind regards,
Caspar Smit

2018-06-08 8:53 GMT+02:00 Caspar Smit <caspars...@supernas.eu>:

> Update:
>
> I've unset nodown to let it continue but now 4 osd's are down and cannot
> be brought up again, here's what the lofgfile reads:
>
> 2018-06-08 08:35:01.716245 7f4c58de4700  0 log_channel(cluster) log [INF]
> : 6.e3s0 continuing backfill to osd.37(4) from (10864'911406,11124'921472]
> 6:c7d71bbd:::rbd_data.5.6c1d9574b0dc51.0000000000bf38b9:head to
> 11124'921472
> 2018-06-08 08:35:01.727261 7f4c585e3700 -1 
> bluestore(/var/lib/ceph/osd/ceph-16)
> _txc_add_transaction error (2) No such file or directory not handled on
> operation 30 (op 0, counting from 0)
> 2018-06-08 08:35:01.727273 7f4c585e3700 -1 
> bluestore(/var/lib/ceph/osd/ceph-16)
> ENOENT on clone suggests osd bug
>
> 2018-06-08 08:35:01.730584 7f4c585e3700 -1 /home/builder/source/ceph-12.
> 2.2/src/os/bluestore/BlueStore.cc: In function 'void BlueStore::_txc_add_
> transaction(BlueStore::TransContext*, ObjectStore::Transaction*)' thread
> 7f4c585e3700 time 2018-06-08 08:35:01.727379
> /home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363:
> FAILED assert(0 == "unexpected error")
>
>  ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x102) [0x558e08ba4202]
>  2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
> ObjectStore::Transaction*)+0x15fa) [0x558e08a55c3a]
>  3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
> std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transaction>
> >&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546)
> [0x558e08a572a6]
>  4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
> ObjectStore::Transaction&&, Context*, Context*, Context*,
> boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f)
> [0x558e085fa37f]
>  5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
> ThreadPool::TPHandle*)+0x6c) [0x558e0857db5c]
>  6: (OSD::process_peering_events(std::__cxx11::list<PG*,
> std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x558e085abec2]
>  7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
> ThreadPool::TPHandle&)+0x2c) [0x558e0861a91c]
>  8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x558e08bab3a8]
>  9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540]
>  10: (()+0x7494) [0x7f4c709ca494]
>  11: (clone()+0x3f) [0x7f4c6fa51aff]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> Any help is highly appreciated.
>
> Kind regards,
> Caspar Smit
>
>
> 2018-06-08 7:57 GMT+02:00 Caspar Smit <caspars...@supernas.eu>:
>
>> Well i let it run with flags nodown and it looked like it would finish
>> BUT it all went wrong somewhere:
>>
>> This is now the state:
>>
>>     health: HEALTH_ERR
>>             nodown flag(s) set
>>             5602396/94833780 objects misplaced (5.908%)
>>             Reduced data availability: 143 pgs inactive, 142 pgs peering,
>> 7 pgs stale
>>             Degraded data redundancy: 248859/94833780 objects degraded
>> (0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized
>>             11 stuck requests are blocked > 4096 sec
>>
>>     pgs:     13.965% pgs not active
>>              248859/94833780 objects degraded (0.262%)
>>              5602396/94833780 objects misplaced (5.908%)
>>              830 active+clean
>>              75  remapped+peering
>>              66  peering
>>              26  active+remapped+backfill_wait
>>              6   active+undersized+degraded+remapped+backfill_wait
>>              6   active+recovery_wait+degraded+remapped
>>              3   active+undersized+degraded+remapped+backfilling
>>              3   stale+active+undersized+degraded+remapped+backfill_wait
>>              3   stale+active+remapped+backfill_wait
>>              2   active+recovery_wait+degraded
>>              2   active+remapped+backfilling
>>              1   activating+degraded+remapped
>>              1   stale+remapped+peering
>>
>>
>> #ceph health detail shows:
>>
>> REQUEST_STUCK 11 stuck requests are blocked > 4096 sec
>>     11 ops are blocked > 16777.2 sec
>>     osds 4,7,23,24 have stuck requests > 16777.2 sec
>>
>>
>> So what happened and what should i do now?
>>
>> Thank you very much for any help
>>
>> Kind regards,
>> Caspar
>>
>>
>> 2018-06-07 13:33 GMT+02:00 Sage Weil <s...@newdream.net>:
>>
>>> On Wed, 6 Jun 2018, Caspar Smit wrote:
>>> > Hi all,
>>> >
>>> > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a
>>> node
>>> > to it.
>>> >
>>> > osd-max-backfills is at the default 1 so backfilling didn't go very
>>> fast
>>> > but that doesn't matter.
>>> >
>>> > Once it started backfilling everything looked ok:
>>> >
>>> > ~300 pgs in backfill_wait
>>> > ~10 pgs backfilling (~number of new osd's)
>>> >
>>> > But i noticed the degraded objects increasing a lot. I presume a pg
>>> that is
>>> > in backfill_wait state doesn't accept any new writes anymore? Hence
>>> > increasing the degraded objects?
>>> >
>>> > So far so good, but once a while i noticed a random OSD flapping (they
>>> come
>>> > back up automatically). This isn't because the disk is saturated but a
>>> > driver/controller/kernel incompatibility which 'hangs' the disk for a
>>> short
>>> > time (scsi abort_task error in syslog). Investigating further i noticed
>>> > this was already the case before the node expansion.
>>> >
>>> > These OSD's flapping results in lots of pg states which are a bit
>>> worrying:
>>> >
>>> >              109 active+remapped+backfill_wait
>>> >              80  active+undersized+degraded+remapped+backfill_wait
>>> >              51  active+recovery_wait+degraded+remapped
>>> >              41  active+recovery_wait+degraded
>>> >              27  active+recovery_wait+undersized+degraded+remapped
>>> >              14  active+undersized+remapped+backfill_wait
>>> >              4   active+undersized+degraded+remapped+backfilling
>>> >
>>> > I think the recovery_wait is more important then the backfill_wait, so
>>> i
>>> > like to prioritize these because the recovery_wait was triggered by the
>>> > flapping OSD's
>>>
>>> Just a note: this is fixed in mimic.  Previously, we would choose the
>>> highest-priority PG to start recovery on at the time, but once recovery
>>> had started, the appearance of a new PG with a higher priority (e.g.,
>>> because it finished peering after the others) wouldn't preempt/cancel
>>> the
>>> other PG's recovery, so you would get behavior like the above.
>>>
>>> Mimic implements that preemption, so you should not see behavior like
>>> this.  (If you do, then the function that assigns a priority score to a
>>> PG needs to be tweaked.)
>>>
>>> sage
>>>
>>
>>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph health error (was: Prioritize recovery over backfilling)

Reply via email to