On Fri, Jun 13, 2014 at 5:25 PM, Josef Johansson <jo...@oderland.se> wrote: > Hi Greg, > > Thanks for the clarification. I believe the OSD was in the middle of a deep > scrub (sorry for not mentioning this straight away), so then it could've > been a silent error that got wind during scrub?
Yeah. > > What's best practice when the store is corrupted like this? Remove the OSD from the cluster, and either reformat the disk or replace as you judge appropriate. -Greg > > Cheers, > Josef > > Gregory Farnum skrev 2014-06-14 02:21: > >> The OSD did a read off of the local filesystem and it got back the EIO >> error code. That means the store got corrupted or something, so it >> killed itself to avoid spreading bad data to the rest of the cluster. >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Fri, Jun 13, 2014 at 5:16 PM, Josef Johansson <jo...@oderland.se> >> wrote: >>> >>> Hey, >>> >>> Just examing what happened to an OSD, that was just turned off. Data has >>> been moved away from it, so hesitating to turned it back on. >>> >>> Got the below in the logs, any clues to what the assert talks about? >>> >>> Cheers, >>> Josef >>> >>> -1 os/FileStore.cc: In function 'virtual int FileStore::read(coll_t, >>> const >>> hobject_t&, uint64_t, size_t, ceph::bufferlist&, bool)' thread 7fdacb88 >>> c700 time 2014-06-11 21:13:54.036982 >>> os/FileStore.cc: 2992: FAILED assert(allow_eio || !m_filestore_fail_eio >>> || >>> got != -5) >>> >>> ceph version 0.67.7 (d7ab4244396b57aac8b7e80812115bbd079e6b73) >>> 1: (FileStore::read(coll_t, hobject_t const&, unsigned long, unsigned >>> long, >>> ceph::buffer::list&, bool)+0x653) [0x8ab6c3] >>> 2: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, >>> std::vector<OSDOp, >>> std::allocator<OSDOp> >&)+0x350) [0x708230] >>> 3: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x86) >>> [0x713366] >>> 4: (ReplicatedPG::do_op(std::tr1::shared_ptr<OpRequest>)+0x3095) >>> [0x71acb5] >>> 5: (PG::do_request(std::tr1::shared_ptr<OpRequest>, >>> ThreadPool::TPHandle&)+0x3f0) [0x812340] >>> 6: (OSD::dequeue_op(boost::intrusive_ptr<PG>, >>> std::tr1::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x2ea) [0x75c80a] >>> 7: (OSD::OpWQ::_process(boost::intrusive_ptr<PG>, >>> ThreadPool::TPHandle&)+0x198) [0x770da8] >>> 8: (ThreadPool::WorkQueueVal<std::pair<boost::intrusive_ptr<PG>, >>> std::tr1::shared_ptr<OpRequest> >, boost::intrusive_ptr<PG> >>>> >>>> ::_void_process(void*, ThreadPool::TPHandle&)+0xae) [0x7a89 >>> >>> ce] >>> 9: (ThreadPool::worker(ThreadPool::WorkThread*)+0x68a) [0x9b5dea] >>> 10: (ThreadPool::WorkThread::entry()+0x10) [0x9b7040] >>> 11: (()+0x6b50) [0x7fdadffdfb50] >>> 12: (clone()+0x6d) [0x7fdade53b0ed] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to >>> interpret this. >>> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com