Hi Dan, Mmm looks like a megaraid stuck/ hw failure, curious because today we're under heavy deleting buckets... and today fail the disk... welcome our luck
Aug 17 15:44:12 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 15:44:12 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 81 e9 08 00 02 00 00 Aug 17 15:44:12 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 461498632 Aug 17 15:44:55 CEPH003 kernel: megaraid_sas 0000:03:00.0: 11557 (650994209s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 0a(e0x20/s10) at 1b087380 Aug 17 15:44:55 CEPH003 kernel: megaraid_sas 0000:03:00.0: 11558 (650994209s/0x0001/FATAL) - Uncorrectable medium error logged for VD 08/8 at 1b087380 (on PD 0a(e0x20/s10) at 1b087380) Aug 17 15:44:56 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#4 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 15:44:56 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#4 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 15:44:56 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 Aug 17 15:45:27 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 15:45:27 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 15:45:27 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 Aug 17 16:38:43 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 16:38:43 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 81 e9 08 00 02 00 00 Aug 17 16:38:43 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 461498632 Aug 17 16:38:51 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#1 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 16:38:51 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#1 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 16:38:51 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 Aug 17 16:38:58 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK Aug 17 16:38:58 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 08 72 f0 00 02 00 00 Aug 17 16:38:58 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 453538544 lines 1866-1923/1923 (END) Just replaced the disk , no previusly Smart errors... Regards Manuel -----Mensaje original----- De: Dan van der Ster <d...@vanderster.com> Enviado el: lunes, 17 de agosto de 2020 17:31 Para: EDH - Manuel Rios <mrios...@easydatahost.com> CC: ceph-users <ceph-users@ceph.io> Asunto: Re: [ceph-users] OSD RGW Index 14.2.11 crash Hi, Do you have scsi errors around the time of the crash? `journalctl -k` and look for scsi medium errors. Cheers, Dan On Mon, Aug 17, 2020 at 3:50 PM EDH - Manuel Rios <mrios...@easydatahost.com> wrote: > > Hi , Today one of our SSD dedicated to RGW index crashed, maybe a bug or just > osd crashed. > > Our current versión 14.2.11, today we're under heavy object process... aprox > 60TB data. > > ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) > nautilus (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x14a) [0x563f96b550e5] > 2: (()+0x4d72ad) [0x563f96b552ad] > 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, > unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, > char*)+0xf0e) [0x563f9715aa9e] > 4: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned > long)+0x2a) [0x563f9718453a] > 5: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f] > 6: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0] > 7: (()+0x102fd29) [0x563f976add29] > 8: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162] > 9: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53] > 10: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) > [0x563f975b36bd] > 11: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string > const&, std::string const&, std::string const&)+0x567) > [0x563f975beab7] > 12: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string > const&, unsigned long)+0x72) [0x563f9708f2f2] > 13: (BlueStore::_do_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026] > 14: (BlueStore::_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf] > 15: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x13f5) [0x563f970acca5] > 16: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio > n> >&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x370) [0x563f970c1100] > 17: > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP > Handle*)+0x7f) [0x563f96cb6d3f] > 18: (non-virtual thunk to > PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, > boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f] > 19: > (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) > [0x563f96f2a970] > 20: > (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0 > x298) [0x563f96f32d38] > 21: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) > [0x563f96e4486a] > 22: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63] > 23: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) > [0x563f96c34da2] > 24: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, > ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2] > 25: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f] > 26: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) > [0x563f97203c46] > 27: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x563f97206760] > 28: (()+0x7dd5) [0x7f1e504eddd5] > 29: (clone()+0x6d) [0x7f1e4f3ad02d] > > 0> 2020-08-17 15:45:27.609 7f1e2fa82700 -1 *** Caught signal > (Aborted) ** in thread 7f1e2fa82700 thread_name:tp_osd_tp > > ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) > nautilus (stable) > 1: (()+0xf5d0) [0x7f1e504f55d0] > 2: (gsignal()+0x37) [0x7f1e4f2e52c7] > 3: (abort()+0x148) [0x7f1e4f2e69b8] > 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x199) [0x563f96b55134] > 5: (()+0x4d72ad) [0x563f96b552ad] > 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, > unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, > char*)+0xf0e) [0x563f9715aa9e] > 7: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned > long)+0x2a) [0x563f9718453a] > 8: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f] > 9: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, > rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0] > 10: (()+0x102fd29) [0x563f976add29] > 11: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162] > 12: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53] > 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) > [0x563f975b36bd] > 14: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string > const&, std::string const&, std::string const&)+0x567) > [0x563f975beab7] > 15: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string > const&, unsigned long)+0x72) [0x563f9708f2f2] > 16: (BlueStore::_do_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026] > 17: (BlueStore::_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf] > 18: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x13f5) [0x563f970acca5] > 19: > (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio > n> >&, boost::intrusive_ptr<TrackedOp>, > ThreadPool::TPHandle*)+0x370) [0x563f970c1100] > 20: > (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&, > ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP > Handle*)+0x7f) [0x563f96cb6d3f] > 21: (non-virtual thunk to > PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, > boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f] > 22: > (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) > [0x563f96f2a970] > 23: > (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0 > x298) [0x563f96f32d38] > 24: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) > [0x563f96e4486a] > 25: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, > ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63] > 26: (OSD::dequeue_op(boost::intrusive_ptr<PG>, > boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) > [0x563f96c34da2] > 27: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, > ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2] > 28: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f] > 29: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) > [0x563f97203c46] > 30: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) > [0x563f97206760] > 31: (()+0x7dd5) [0x7f1e504eddd5] > 32: (clone()+0x6d) [0x7f1e4f3ad02d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > Any ideas or similar situation? > > > Manuel Ríos Fernández > CEO - Business development > 677677179 · > mrios...@easydatahost.com<mailto:mrios...@easydatahost.com> > No me imprimas si no es necesario. Protejamos el medio ambiente Este > mensaje y, en su caso, los ficheros anexos son confidenciales, especialmente > en lo que respecta a los datos personales, y se dirigen exclusivamente al > destinatario referenciado. > Si usted no lo es y lo ha recibido por error o tiene conocimiento del mismo > por cualquier motivo, le rogamos que nos lo comunique por este medio y > proceda a destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, > reproducir, alterar, archivar o comunicar a terceros el presente mensaje y > ficheros anexos, todo ello bajo pena de incurrir en responsabilidades > legales. El emisor no garantiza la integridad, rapidez o seguridad del > presente correo, ni se responsabiliza de posibles perjuicios derivados de la > captura, incorporaciones de virus o cualesquiera otras manipulaciones > efectuadas por terceros. > This e-mail message and all attachments transmitted with it may contain > legally privileged, proprietary and/or confidential information intended > solely for the use of the addressee. If you are not the intended recipient, > you are hereby notified that any review, dissemination, distribution, > duplication or other use of this message and/or its attachments is strictly > prohibited. If you are not the intended recipient, please contact the sender > by reply e-mail and destroy all copies of the original message and its > attachments. Thank you. > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io