Roeland,

We're seeing the same problems in our cluster.  I can't offer you a
solution that gets the OSD back, but I can tell you what I did to work
around it.

We're running 5 0.94.6 clusters with 9 nodes / 648 HDD OSDs with a k=7, m=2
erasure coded .rgw.buckets pool.  During the backfilling after a recent
disk replacement, we had four OSDs that got in a very similar state.

2016-08-09 07:40:12.475699 7f025b06b700 -1 osd/ECBackend.cc: In function
'void ECBackend::handle_recovery_push(PushOp&, RecoveryMessages*)' thread
7f025b06b700 time 2016-08-09 07:40:12.472819
osd/ECBackend.cc: 281: FAILED assert(op.attrset.count(string("_")))

 ceph version 0.94.6-2 (f870be457b16e4ff56ced74ed3a3c9a4c781f281)
 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x8b) [0xba997b]
 2: (ECBackend::handle_recovery_push(PushOp&, RecoveryMessages*)+0xd7f)
[0xa239ff]
 3: (ECBackend::handle_message(std::tr1::shared_ptr<OpRequest>)+0x1de)
[0xa2600e]
 4: (ReplicatedPG::do_request(std::tr1::shared_ptr<OpRequest>&,
ThreadPool::TPHandle&)+0x167) [0x8305e7]
 5: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x6a157d]
 6: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x338) [0x6a1aa8]
 7: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f)
[0xb994cf]
 8: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xb9b5f0]
 9: (()+0x8184) [0x7f0284e35184]
 10: (clone()+0x6d) [0x7f028324c37d]

To allow the cluster to recover, we ended up reweighting the OSDs that got
into this state to 0 (ceph osd crush reweight <osd-id> 0).  This will of
course kick off a long round of backfilling, but it eventually recovers.
We've never found a solution that gets the OSD healthy again that doesn't
involve nuking the underlying disk and starting over.  We've had 10 OSDs
get in this state across 2 clusters in the last few months.  The
failure/crash message is always the same.  If someone does know of a way to
recover the OSD, that would be great.

I hope this helps.

Brian Felton

On Wed, Aug 10, 2016 at 10:17 AM, Roeland Mertens <
roeland.mert...@genomicsplc.com> wrote:

> Hi,
>
> we run a Ceph 10.2.1 cluster across 35 nodes with a total of 595 OSDs, we
> have a mixture of normally replicated volumes and EC volumes using the
> following erasure-code-profile:
>
> # ceph osd erasure-code-profile get rsk8m5
> jerasure-per-chunk-alignment=false
> k=8
> m=5
> plugin=jerasure
> ruleset-failure-domain=host
> ruleset-root=default
> technique=reed_sol_van
> w=8
>
> Now we had a disk failure and on swap out we seem to have encountered a
> bug where during recovery OSDs crash when trying to fix certain pgs that
> may have been corrupted.
>
> For example:
>    -3> 2016-08-10 12:38:21.302938 7f893e2d7700  5 -- op tracker -- seq:
> 3434, time: 2016-08-10 12:38:21.302938, event: queued_for_pg, op:
> MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
>     -2> 2016-08-10 12:38:21.302981 7f89bef50700  1 --
> 10.93.105.11:6831/2674119 --> 10.93.105.22:6802/357033 --
> osd_map(47662..47663 src has 32224..47663) v3 -- ?+0 0x559c1057f3c0 con
> 0x559c0664a700
>     -1> 2016-08-10 12:38:21.302996 7f89bef50700  5 -- op tracker -- seq:
> 3434, time: 2016-08-10 12:38:21.302996, event: reached_pg, op:
> MOSDECSubOpReadReply(63.1a18s0 47661 ECSubReadReply(tid=1, attrs_read=0))
>      0> 2016-08-10 12:38:21.306193 7f89bef50700 -1 osd/ECBackend.cc: In
> function 'virtual void 
> OnRecoveryReadComplete::finish(std::pair<RecoveryMessages*,
> ECBackend::read_result_t&>&)' thread 7f89bef50700 time 2016-08-10
> 12:38:21.303012
> osd/ECBackend.cc: 203: FAILED assert(res.errors.empty())
>
> then the ceph-osd daemon goes splat. I've attached an extract of a logfile
> showing a bit more.
>
> Anyone have any ideas? I'm stuck now with a pg that's stuck as
> down+remapped+peering. ceph pg query tells me that peering is blocked to
> the loss of an osd, though restarting it just results in another crash of
> the ceph-osd daemon. We tried to force a rebuild by using
> ceph-objectstore-tool to delete the pg segment on some of the OSDs that are
> crashing but that didn't help one iota.
>
> Any help would be greatly appreciated,
>
> regards,
>
> Roeland
>
> --
> This email is sent on behalf of Genomics plc, a public limited company
> registered in England and Wales with registered number 8839972, VAT
> registered number 189 2635 65 and registered office at King Charles House,
> Park End Street, Oxford, OX1 1JD, United Kingdom.
> The contents of this e-mail and any attachments are confidential to the
> intended recipient. If you are not the intended recipient please do not use
> or publish its contents, contact Genomics plc immediately at
> i...@genomicsplc.com <i...@genomicsltd.com> then delete. You may not
> copy, forward, use or disclose the contents of this email to anybody else
> if you are not the intended recipient. Emails are not secure and may
> contain viruses.
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to