Hi all,
        rocksdb failed to open when the ceph-osd process was restarted after 
unplugging the OSD data disk with Ceph 14.2.5 on Centos 7.6.
        
        1) After unplugging the OSD data disk, the ceph-osd process exist.
        -3> 2020-07-13 15:25:35.912 7f1ad7254700 -1 bdev(0x559d1134f880 
/var/lib/ceph/osd/ceph-10/block) _sync_write sync_file_range error: (5) 
Input/output error
    -2> 2020-07-13 15:25:35.912 7f1ad9c5f700 -1 bdev(0x559d1134f880 
/var/lib/ceph/osd/ceph-10/block) _aio_thread got r=-5 ((5) Input/output error)
    -1> 2020-07-13 15:25:35.917 7f1ad9c5f700 -1 
/root/rpmbuild/BUILD/ceph-14.2.5-1.0.9/src/os/bluestore/KernelDevice.cc: In 
function 'void KernelDevice::_aio_thread()' thread 7f1ad9c5f700 time 2020-07-13 
15:25:35.913821
        
/root/rpmbuild/BUILD/ceph-14.2.5-1.0.9/src/os/bluestore/KernelDevice.cc: 534: 
ceph_abort_msg("Unexpected IO error. This may suggest a hardware issue. Please 
check your kernel log!")

        ceph version 14.2.5-93-g9a4f93e 
(9a4f93e7143bcdd5fadc88eb58bb730ae97b89c5) nautilus (stable)
        1: (ceph::__ceph_abort(char const*, int, char const*, std::string 
const&)+0xdd) [0x559d05b6069a]
        2: (KernelDevice::_aio_thread()+0xebe) [0x559d061a54ee]
        3: (KernelDevice::AioCompletionThread::entry()+0xd) [0x559d061a7add]
        4: (()+0x7dd5) [0x7f1ae66aedd5]
        5: (clone()+0x6d) [0x7f1ae5572ead]
        
        2) Plug the disk back in and restart the ceph-osd process, rocksdb 
found that incomplete records existed and stop to work.
        2020-07-13 15:51:38.305 7f9801ef5a80  4 rocksdb: 
[db/db_impl_open.cc:583] Recovering log #9 mode 0
        2020-07-13 15:51:38.748 7f9801ef5a80  3 rocksdb: 
[db/db_impl_open.cc:518] db.wal/000009.log: dropping 2922 bytes; Corruption: 
missing start of fragmented record(2)
        2020-07-13 15:51:38.748 7f9801ef5a80  4 rocksdb: [db/db_impl.cc:390] 
Shutdown: canceling all background work
        2020-07-13 15:51:38.748 7f9801ef5a80  4 rocksdb: [db/db_impl.cc:563] 
Shutdown complete
        2020-07-13 15:51:38.748 7f9801ef5a80 -1 rocksdb: Corruption: missing 
start of fragmented record(2)
        2020-07-13 15:51:38.748 7f9801ef5a80 -1 
bluestore(/var/lib/ceph/osd/ceph-10) _open_db erroring opening db:
        2020-07-13 15:51:38.748 7f9801ef5a80  1 bluefs umount
        2020-07-13 15:51:38.776 7f9801ef5a80  1 fbmap_alloc 0x55c897e0a900 
shutdown
        2020-07-13 15:51:38.776 7f9801ef5a80  1 bdev(0x55c898a6ce00 
/var/lib/ceph/osd/ceph-10/block) close
        
        Why does rocksdb not automatically delete these incomplete records and 
continue work? 
        In addition, after the occurrence of this situation, what method should 
be used to recover.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to