Disk got corrupted, it might be dead. Check kernel log for errors and SMART reallocated sector count or errors.
If the disk is still good: simply re-create the OSD. Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Fri, May 24, 2019 at 3:51 PM Guillaume Chenuet < guillaume.chen...@schibsted.com> wrote: > Hi, > > We are running a Ceph cluster with 36 OSD splitted on 3 servers (12 OSD > per server) and Ceph version > 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable). > > This cluster is used by an OpenStack private cloud and deployed with > OpenStack Kolla. Every OSD ran into a Docker container on the server and > MON, MGR, MDS, and RGW are running on 3 other servers. > > This week, one OSD crashed and failed to restart, with this stack trace: > > Running command: '/usr/bin/ceph-osd -f --public-addr 10.106.142.30 > --cluster-addr 10.106.142.30 -i 35' > + exec /usr/bin/ceph-osd -f --public-addr 10.106.142.30 --cluster-addr > 10.106.142.30 -i 35 > starting osd.35 at - osd_data /var/lib/ceph/osd/ceph-35 > /var/lib/ceph/osd/ceph-35/journal > /builddir/build/BUILD/ceph-12.2.11/src/os/bluestore/BlueFS.cc: In function > 'int BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, > uint64_t, size_t, ceph::bufferlist*, char*)' thread 7efd088d6d80 time > 2019-05-24 05:40:47.799918 > /builddir/build/BUILD/ceph-12.2.11/src/os/bluestore/BlueFS.cc: 1000: > FAILED assert(r == 0) > ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous > (stable) > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x110) [0x556f7833f8f0] > 2: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, > unsigned long, unsigned long, ceph::buffer::list*, char*)+0xca4) > [0x556f782b5574] > 3: (BlueFS::_replay(bool)+0x2ef) [0x556f782c82af] > 4: (BlueFS::mount()+0x1d4) [0x556f782cc014] > 5: (BlueStore::_open_db(bool)+0x1847) [0x556f781e0ce7] > 6: (BlueStore::_mount(bool)+0x40e) [0x556f782126ae] > 7: (OSD::init()+0x3bd) [0x556f77dbbaed] > 8: (main()+0x2d07) [0x556f77cbe667] > 9: (__libc_start_main()+0xf5) [0x7efd04fa63d5] > 10: (()+0x4c1f73) [0x556f77d5ef73] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed > to interpret this. > *** Caught signal (Aborted) ** > in thread 7efd088d6d80 thread_name:ceph-osd > ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous > (stable) > 1: (()+0xa63931) [0x556f78300931] > 2: (()+0xf5d0) [0x7efd05f995d0] > 3: (gsignal()+0x37) [0x7efd04fba207] > 4: (abort()+0x148) [0x7efd04fbb8f8] > 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x284) [0x556f7833fa64] > 6: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, > unsigned long, unsigned long, ceph::buffer::list*, char*)+0xca4) > [0x556f782b5574] > 7: (BlueFS::_replay(bool)+0x2ef) [0x556f782c82af] > 8: (BlueFS::mount()+0x1d4) [0x556f782cc014] > 9: (BlueStore::_open_db(bool)+0x1847) [0x556f781e0ce7] > 10: (BlueStore::_mount(bool)+0x40e) [0x556f782126ae] > 11: (OSD::init()+0x3bd) [0x556f77dbbaed] > 12: (main()+0x2d07) [0x556f77cbe667] > 13: (__libc_start_main()+0xf5) [0x7efd04fa63d5] > 14: (()+0x4c1f73) [0x556f77d5ef73] > > The cluster health is OK and Ceph sees this OSD as shutdown. > > I tried to find more information on the internet about this error without > luck. > Do you have any idea or input about this error, please? > > Thanks, > Guillaume > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com