Update:

I've unset nodown to let it continue but now 4 osd's are down and cannot be
brought up again, here's what the lofgfile reads:

2018-06-08 08:35:01.716245 7f4c58de4700  0 log_channel(cluster) log [INF] :
6.e3s0 continuing backfill to osd.37(4) from (10864'911406,11124'921472]
6:c7d71bbd:::rbd_data.5.6c1d9574b0dc51.0000000000bf38b9:head to 11124'921472
2018-06-08 08:35:01.727261 7f4c585e3700 -1
bluestore(/var/lib/ceph/osd/ceph-16) _txc_add_transaction error (2) No such
file or directory not handled on operation 30 (op 0, counting from 0)
2018-06-08 08:35:01.727273 7f4c585e3700 -1
bluestore(/var/lib/ceph/osd/ceph-16) ENOENT on clone suggests osd bug

2018-06-08 08:35:01.730584 7f4c585e3700 -1
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: In function
'void BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)' thread 7f4c585e3700 time 2018-06-08
08:35:01.727379
/home/builder/source/ceph-12.2.2/src/os/bluestore/BlueStore.cc: 9363:
FAILED assert(0 == "unexpected error")

 ceph version 12.2.2 (215dd7151453fae88e6f968c975b6ce309d42dcf) luminous
(stable)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x102) [0x558e08ba4202]
 2: (BlueStore::_txc_add_transaction(BlueStore::TransContext*,
ObjectStore::Transaction*)+0x15fa) [0x558e08a55c3a]
 3: (BlueStore::queue_transactions(ObjectStore::Sequencer*,
std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x546)
[0x558e08a572a6]
 4: (ObjectStore::queue_transaction(ObjectStore::Sequencer*,
ObjectStore::Transaction&&, Context*, Context*, Context*,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0x14f)
[0x558e085fa37f]
 5: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
ThreadPool::TPHandle*)+0x6c) [0x558e0857db5c]
 6: (OSD::process_peering_events(std::__cxx11::list<PG*,
std::allocator<PG*> > const&, ThreadPool::TPHandle&)+0x442) [0x558e085abec2]
 7: (ThreadPool::BatchWorkQueue<PG>::_void_process(void*,
ThreadPool::TPHandle&)+0x2c) [0x558e0861a91c]
 8: (ThreadPool::worker(ThreadPool::WorkThread*)+0xeb8) [0x558e08bab3a8]
 9: (ThreadPool::WorkThread::entry()+0x10) [0x558e08bac540]
 10: (()+0x7494) [0x7f4c709ca494]
 11: (clone()+0x3f) [0x7f4c6fa51aff]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.

Any help is highly appreciated.

Kind regards,
Caspar Smit


2018-06-08 7:57 GMT+02:00 Caspar Smit <caspars...@supernas.eu>:

> Well i let it run with flags nodown and it looked like it would finish BUT
> it all went wrong somewhere:
>
> This is now the state:
>
>     health: HEALTH_ERR
>             nodown flag(s) set
>             5602396/94833780 objects misplaced (5.908%)
>             Reduced data availability: 143 pgs inactive, 142 pgs peering,
> 7 pgs stale
>             Degraded data redundancy: 248859/94833780 objects degraded
> (0.262%), 194 pgs unclean, 21 pgs degraded, 12 pgs undersized
>             11 stuck requests are blocked > 4096 sec
>
>     pgs:     13.965% pgs not active
>              248859/94833780 objects degraded (0.262%)
>              5602396/94833780 objects misplaced (5.908%)
>              830 active+clean
>              75  remapped+peering
>              66  peering
>              26  active+remapped+backfill_wait
>              6   active+undersized+degraded+remapped+backfill_wait
>              6   active+recovery_wait+degraded+remapped
>              3   active+undersized+degraded+remapped+backfilling
>              3   stale+active+undersized+degraded+remapped+backfill_wait
>              3   stale+active+remapped+backfill_wait
>              2   active+recovery_wait+degraded
>              2   active+remapped+backfilling
>              1   activating+degraded+remapped
>              1   stale+remapped+peering
>
>
> #ceph health detail shows:
>
> REQUEST_STUCK 11 stuck requests are blocked > 4096 sec
>     11 ops are blocked > 16777.2 sec
>     osds 4,7,23,24 have stuck requests > 16777.2 sec
>
>
> So what happened and what should i do now?
>
> Thank you very much for any help
>
> Kind regards,
> Caspar
>
>
> 2018-06-07 13:33 GMT+02:00 Sage Weil <s...@newdream.net>:
>
>> On Wed, 6 Jun 2018, Caspar Smit wrote:
>> > Hi all,
>> >
>> > We have a Luminous 12.2.2 cluster with 3 nodes and i recently added a
>> node
>> > to it.
>> >
>> > osd-max-backfills is at the default 1 so backfilling didn't go very fast
>> > but that doesn't matter.
>> >
>> > Once it started backfilling everything looked ok:
>> >
>> > ~300 pgs in backfill_wait
>> > ~10 pgs backfilling (~number of new osd's)
>> >
>> > But i noticed the degraded objects increasing a lot. I presume a pg
>> that is
>> > in backfill_wait state doesn't accept any new writes anymore? Hence
>> > increasing the degraded objects?
>> >
>> > So far so good, but once a while i noticed a random OSD flapping (they
>> come
>> > back up automatically). This isn't because the disk is saturated but a
>> > driver/controller/kernel incompatibility which 'hangs' the disk for a
>> short
>> > time (scsi abort_task error in syslog). Investigating further i noticed
>> > this was already the case before the node expansion.
>> >
>> > These OSD's flapping results in lots of pg states which are a bit
>> worrying:
>> >
>> >              109 active+remapped+backfill_wait
>> >              80  active+undersized+degraded+remapped+backfill_wait
>> >              51  active+recovery_wait+degraded+remapped
>> >              41  active+recovery_wait+degraded
>> >              27  active+recovery_wait+undersized+degraded+remapped
>> >              14  active+undersized+remapped+backfill_wait
>> >              4   active+undersized+degraded+remapped+backfilling
>> >
>> > I think the recovery_wait is more important then the backfill_wait, so i
>> > like to prioritize these because the recovery_wait was triggered by the
>> > flapping OSD's
>>
>> Just a note: this is fixed in mimic.  Previously, we would choose the
>> highest-priority PG to start recovery on at the time, but once recovery
>> had started, the appearance of a new PG with a higher priority (e.g.,
>> because it finished peering after the others) wouldn't preempt/cancel the
>> other PG's recovery, so you would get behavior like the above.
>>
>> Mimic implements that preemption, so you should not see behavior like
>> this.  (If you do, then the function that assigns a priority score to a
>> PG needs to be tweaked.)
>>
>> sage
>>
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to