Re: [ceph-users] OSD turned itself off

Gregory Farnum Fri, 13 Jun 2014 17:37:27 -0700

On Fri, Jun 13, 2014 at 5:25 PM, Josef Johansson <jo...@oderland.se> wrote:
> Hi Greg,
>
> Thanks for the clarification. I believe the OSD was in the middle of a deep
> scrub (sorry for not mentioning this straight away), so then it could've
> been a silent error that got wind during scrub?


Yeah.

>
> What's best practice when the store is corrupted like this?

Remove the OSD from the cluster, and either reformat the disk or
replace as you judge appropriate.
-Greg

>
> Cheers,
> Josef
>
> Gregory Farnum skrev 2014-06-14 02:21:
>
>> The OSD did a read off of the local filesystem and it got back the EIO
>> error code. That means the store got corrupted or something, so it
>> killed itself to avoid spreading bad data to the rest of the cluster.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>
>> On Fri, Jun 13, 2014 at 5:16 PM, Josef Johansson <jo...@oderland.se>
>> wrote:
>>>
>>> Hey,
>>>
>>> Just examing what happened to an OSD, that was just turned off. Data has
>>> been moved away from it, so hesitating to turned it back on.
>>>
>>> Got the below in the logs, any clues to what the assert talks about?
>>>
>>> Cheers,
>>> Josef
>>>
>>> -1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t,
>>> const
>>> hobject_t&, uint64_t, size_t, ceph::bufferlist&, bool)' thread 7fdacb88
>>> c700 time 2014-06-11 21:13:54.036982
>>> os/FileStore.cc: 2992: FAILED assert(allow_eio || !m_filestore_fail_eio
>>> ||
>>> got != -5)
>>>
>>>   ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73)
>>>   1: (FileStore::read(coll_t, hobject_t const&, unsigned long, unsigned
>>> long,
>>> ceph::buffer::list&, bool)+0x653) [0x8ab6c3]
>>>   2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*,
>>> std::vector<OSDOp,
>>> std::allocator<OSDOp> >&)+0x350) [0x708230]
>>>   3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x86)
>>> [0x713366]
>>>   4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3095)
>>> [0x71acb5]
>>>   5: (PG::do_request(std::tr1::shared_ptr<OpRequest>,
>>> ThreadPool::TPHandle&)+0x3f0) [0x812340]
>>>   6: (OSD::dequeue_op(boost::intrusive_ptr<PG>,
>>> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x2ea) [0x75c80a]
>>>   7: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>,
>>> ThreadPool::TPHandle&)+0x198) [0x770da8]
>>>   8: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>,
>>> std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG>
>>>>
>>>> ::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a89
>>>
>>> ce]
>>>   9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9b5dea]
>>>   10: (ThreadPool::WorkThread::entry()+0x10) [0x9b7040]
>>>   11: (()+0x6b50) [0x7fdadffdfb50]
>>>   12: (clone()+0x6d) [0x7fdade53b0ed]
>>>   NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>> needed to
>>> interpret this.
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD turned itself off

Reply via email to