Hi Dan,

Mmm looks like a megaraid stuck/ hw failure, curious because today we're under 
heavy deleting buckets... and today fail the disk... welcome our luck

Aug 17 15:44:12 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK
Aug 17 15:44:12 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 
81 e9 08 00 02 00 00
Aug 17 15:44:12 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 
461498632
Aug 17 15:44:55 CEPH003 kernel: megaraid_sas 0000:03:00.0: 11557 
(650994209s/0x0002/FATAL) - Unrecoverable medium error during recovery on PD 
0a(e0x20/s10) at 1b087380
Aug 17 15:44:55 CEPH003 kernel: megaraid_sas 0000:03:00.0: 11558 
(650994209s/0x0001/FATAL) - Uncorrectable medium error logged for VD 08/8 at 
1b087380 (on PD 0a(e0x20/s10) at 1b087380)
Aug 17 15:44:56 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#4 FAILED Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK
Aug 17 15:44:56 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#4 CDB: Read(10) 28 00 1b 
08 72 f0 00 02 00 00
Aug 17 15:44:56 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 
453538544
Aug 17 15:45:27 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK
Aug 17 15:45:27 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 
08 72 f0 00 02 00 00
Aug 17 15:45:27 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 
453538544
Aug 17 16:38:43 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK
Aug 17 16:38:43 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 
81 e9 08 00 02 00 00
Aug 17 16:38:43 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 
461498632
Aug 17 16:38:51 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#1 FAILED Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK
Aug 17 16:38:51 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#1 CDB: Read(10) 28 00 1b 
08 72 f0 00 02 00 00
Aug 17 16:38:51 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 
453538544
Aug 17 16:38:58 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 FAILED Result: 
hostbyte=DID_ERROR driverbyte=DRIVER_OK
Aug 17 16:38:58 CEPH003 kernel: sd 0:2:8:0: [sdi] tag#0 CDB: Read(10) 28 00 1b 
08 72 f0 00 02 00 00
Aug 17 16:38:58 CEPH003 kernel: blk_update_request: I/O error, dev sdi, sector 
453538544
lines 1866-1923/1923 (END)

Just replaced the disk , no previusly Smart errors... 

Regards
Manuel

-----Mensaje original-----
De: Dan van der Ster <d...@vanderster.com> 
Enviado el: lunes, 17 de agosto de 2020 17:31
Para: EDH - Manuel Rios <mrios...@easydatahost.com>
CC: ceph-users <ceph-users@ceph.io>
Asunto: Re: [ceph-users] OSD RGW Index 14.2.11 crash

Hi,

Do you have scsi errors around the time of the crash?
`journalctl -k` and look for scsi medium errors.

Cheers, Dan


On Mon, Aug 17, 2020 at 3:50 PM EDH - Manuel Rios <mrios...@easydatahost.com> 
wrote:
>
> Hi , Today one of our SSD dedicated to RGW index crashed, maybe a bug or just 
> osd crashed.
>
> Our current versión 14.2.11, today we're under heavy object process... aprox 
> 60TB data.
>
> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
> nautilus (stable)
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x14a) [0x563f96b550e5]
> 2: (()+0x4d72ad) [0x563f96b552ad]
> 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, 
> unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, 
> char*)+0xf0e) [0x563f9715aa9e]
> 4: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned 
> long)+0x2a) [0x563f9718453a]
> 5: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, 
> rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f]
> 6: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, 
> rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0]
> 7: (()+0x102fd29) [0x563f976add29]
> 8: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162]
> 9: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53]
> 10: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) 
> [0x563f975b36bd]
> 11: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string 
> const&, std::string const&, std::string const&)+0x567) 
> [0x563f975beab7]
> 12: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string 
> const&, unsigned long)+0x72) [0x563f9708f2f2]
> 13: (BlueStore::_do_remove(BlueStore::TransContext*, 
> boost::intrusive_ptr<BlueStore::Collection>&, 
> boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026]
> 14: (BlueStore::_remove(BlueStore::TransContext*, 
> boost::intrusive_ptr<BlueStore::Collection>&, 
> boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf]
> 15: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> ObjectStore::Transaction*)+0x13f5) [0x563f970acca5]
> 16: 
> (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio 
>                       n> >&, boost::intrusive_ptr<TrackedOp>, 
> ThreadPool::TPHandle*)+0x370) [0x563f970c1100]
> 17: 
> (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP  
>                      Handle*)+0x7f) [0x563f96cb6d3f]
> 18: (non-virtual thunk to 
> PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, 
> boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f]
> 19: 
> (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) 
> [0x563f96f2a970]
> 20: 
> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0
> x298) [0x563f96f32d38]
> 21: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) 
> [0x563f96e4486a]
> 22: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, 
> ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63]
> 23: (OSD::dequeue_op(boost::intrusive_ptr<PG>, 
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) 
> [0x563f96c34da2]
> 24: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, 
> ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2]
> 25: (OSD::ShardedOpWQ::_process(unsigned int, 
> ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f]
> 26: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) 
> [0x563f97203c46]
> 27: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) 
> [0x563f97206760]
> 28: (()+0x7dd5) [0x7f1e504eddd5]
> 29: (clone()+0x6d) [0x7f1e4f3ad02d]
>
>      0> 2020-08-17 15:45:27.609 7f1e2fa82700 -1 *** Caught signal 
> (Aborted) ** in thread 7f1e2fa82700 thread_name:tp_osd_tp
>
> ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) 
> nautilus (stable)
> 1: (()+0xf5d0) [0x7f1e504f55d0]
> 2: (gsignal()+0x37) [0x7f1e4f2e52c7]
> 3: (abort()+0x148) [0x7f1e4f2e69b8]
> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
> const*)+0x199) [0x563f96b55134]
> 5: (()+0x4d72ad) [0x563f96b552ad]
> 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, 
> unsigned long, unsigned long, ceph::buffer::v14_2_0::list*, 
> char*)+0xf0e) [0x563f9715aa9e]
> 7: (BlueRocksRandomAccessFile::Prefetch(unsigned long, unsigned 
> long)+0x2a) [0x563f9718453a]
> 8: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, 
> rocksdb::Slice>::InitDataBlock()+0x29f) [0x563f9772697f]
> 9: (rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, 
> rocksdb::Slice>::FindKeyForward()+0x1c0) [0x563f97726bb0]
> 10: (()+0x102fd29) [0x563f976add29]
> 11: (rocksdb::MergingIterator::Next()+0x42) [0x563f97738162]
> 12: (rocksdb::DBIter::Next()+0x1f3) [0x563f97641e53]
> 13: (RocksDBStore::RocksDBWholeSpaceIteratorImpl::next()+0x2d) 
> [0x563f975b36bd]
> 14: (RocksDBStore::RocksDBTransactionImpl::rm_range_keys(std::string 
> const&, std::string const&, std::string const&)+0x567) 
> [0x563f975beab7]
> 15: (BlueStore::_do_omap_clear(BlueStore::TransContext*, std::string 
> const&, unsigned long)+0x72) [0x563f9708f2f2]
> 16: (BlueStore::_do_remove(BlueStore::TransContext*, 
> boost::intrusive_ptr<BlueStore::Collection>&, 
> boost::intrusive_ptr<BlueStore::Onode>)+0xc16) [0x563f970a6026]
> 17: (BlueStore::_remove(BlueStore::TransContext*, 
> boost::intrusive_ptr<BlueStore::Collection>&, 
> boost::intrusive_ptr<BlueStore::Onode>&)+0x5f) [0x563f970a6cbf]
> 18: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, 
> ObjectStore::Transaction*)+0x13f5) [0x563f970acca5]
> 19: 
> (BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  std::vector<ObjectStore::Transaction, std::allocator<ObjectStore::Transactio 
>                       n> >&, boost::intrusive_ptr<TrackedOp>, 
> ThreadPool::TPHandle*)+0x370) [0x563f970c1100]
> 20: 
> (ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::CollectionImpl>&,
>  ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>, ThreadPool::TP  
>                      Handle*)+0x7f) [0x563f96cb6d3f]
> 21: (non-virtual thunk to 
> PrimaryLogPG::queue_transaction(ObjectStore::Transaction&&, 
> boost::intrusive_ptr<OpRequest>)+0x4f) [0x563f96e3015f]
> 22: 
> (ReplicatedBackend::_do_push(boost::intrusive_ptr<OpRequest>)+0x4a0) 
> [0x563f96f2a970]
> 23: 
> (ReplicatedBackend::_handle_message(boost::intrusive_ptr<OpRequest>)+0
> x298) [0x563f96f32d38]
> 24: (PGBackend::handle_message(boost::intrusive_ptr<OpRequest>)+0x4a) 
> [0x563f96e4486a]
> 25: (PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, 
> ThreadPool::TPHandle&)+0x5b3) [0x563f96df4c63]
> 26: (OSD::dequeue_op(boost::intrusive_ptr<PG>, 
> boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x362) 
> [0x563f96c34da2]
> 27: (PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&, 
> ThreadPool::TPHandle&)+0x62) [0x563f96ec37c2]
> 28: (OSD::ShardedOpWQ::_process(unsigned int, 
> ceph::heartbeat_handle_d*)+0x90f) [0x563f96c4fd3f]
> 29: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5b6) 
> [0x563f97203c46]
> 30: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) 
> [0x563f97206760]
> 31: (()+0x7dd5) [0x7f1e504eddd5]
> 32: (clone()+0x6d) [0x7f1e4f3ad02d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
>
> Any ideas or similar situation?
>
>
> Manuel Ríos Fernández
> CEO - Business development
> 677677179 · 
> mrios...@easydatahost.com<mailto:mrios...@easydatahost.com>
> No me imprimas si no es necesario. Protejamos el medio ambiente Este 
> mensaje y, en su caso, los ficheros anexos son confidenciales, especialmente 
> en lo que respecta a los datos personales, y se dirigen exclusivamente al 
> destinatario referenciado.
> Si usted no lo es y lo ha recibido por error o tiene conocimiento del mismo 
> por cualquier motivo, le rogamos que nos lo comunique por este medio y 
> proceda a destruirlo o borrarlo, y que en todo caso se abstenga de utilizar, 
> reproducir, alterar, archivar o comunicar a terceros el presente mensaje y 
> ficheros anexos, todo ello bajo pena de incurrir en responsabilidades 
> legales. El emisor no garantiza la integridad, rapidez o seguridad del 
> presente correo, ni se responsabiliza de posibles perjuicios derivados de la 
> captura, incorporaciones de virus o cualesquiera otras manipulaciones 
> efectuadas por terceros.
> This e-mail message and all attachments transmitted with it may contain 
> legally privileged, proprietary and/or confidential information intended 
> solely for the use of the addressee. If you are not the intended recipient, 
> you are hereby notified that any review, dissemination, distribution, 
> duplication or other use of this message and/or its attachments is strictly 
> prohibited. If you are not the intended recipient, please contact the sender 
> by reply e-mail and destroy all copies of the original message and its 
> attachments. Thank you.
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an 
> email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to