Hi list, After the nodes ran OOM and after reboot, we are not able to restart the ceph-osd@x services anymore. (Details about the setup at the end).
I am trying to do this manually, so we can see the error but all i see is several crash dumps - this is just one of the OSDs which is not starting. Any idea how to get past this?? [root@ceph001 ~]# /usr/bin/ceph-osd --debug_osd 10 -f --cluster ceph --id 83 --setuser ceph --setgroup ceph > /tmp/dump 2>&1 starting osd.83 at - osd_data /var/lib/ceph/osd/ceph-83 /var/lib/ceph/osd/ceph-83/journal /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/ x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14b) [0x2aaaaaf3d36b] 2: (()+0x26e4f7) [0x2aaaaaf3d4f7] 3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 7: (OSD::load_pgs()+0x4a9) [0x555555917e39] 8: (OSD::init()+0xc99) [0x5555559238e9] 9: (main()+0x23a3) [0x5555558017a3] 10: (__libc_start_main()+0xf5) [0x2aaab77de495] 11: (()+0x385900) [0x5555558d9900] -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) ** in thread 2aaaaaaf5540 thread_name:ceph-osd ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic (stable) 1: (()+0xf5d0) [0x2aaab69765d0] 2: (gsignal()+0x37) [0x2aaab77f22c7] 3: (abort()+0x148) [0x2aaab77f39b8] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x248) [0x2aaaaaf3d468] 5: (()+0x26e4f7) [0x2aaaaaf3d4f7] 6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned long)+0x46d) [0x555555c0bd3d] 7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, PGBackend::Listener*, coll_t, boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*, CephContext*)+0x30a) [0x555555b0ba8a] 8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap const>, PGPool const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, spg_t)+0x140) [0x555555abd100] 9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb) [0x555555914ecb] 10: (OSD::load_pgs()+0x4a9) [0x555555917e39] 11: (OSD::init()+0xc99) [0x5555559238e9] 12: (main()+0x23a3) [0x5555558017a3] 13: (__libc_start_main()+0xf5) [0x2aaab77de495] 14: (()+0x385900) [0x5555558d9900] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. Environment: [root@ceph001 ~]# uname -r 3.10.0-957.27.2.el7.x86_64 [root@ceph001 ~]# cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) [root@ceph001 ~]# rpm -qa | grep -i ceph cm-config-ceph-release-mimic-8.2-73_cm8.2.noarch ceph-13.2.6-0.el7.x86_64 ceph-selinux-13.2.6-0.el7.x86_64 ceph-base-13.2.6-0.el7.x86_64 ceph-osd-13.2.6-0.el7.x86_64 cm-config-ceph-radosgw-systemd-8.2-6_cm8.2.noarch libcephfs2-13.2.6-0.el7.x86_64 ceph-common-13.2.6-0.el7.x86_64 ceph-mgr-13.2.6-0.el7.x86_64 cm-config-ceph-systemd-8.2-12_cm8.2.noarch ceph-mon-13.2.6-0.el7.x86_64 python-cephfs-13.2.6-0.el7.x86_64 ceph-mds-13.2.6-0.el7.x86_64 ceph osd tree: ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 785.95801 root default -5 261.98599 host ceph001 1 hdd 7.27699 osd.1 up 1.00000 1.00000 3 hdd 7.27699 osd.3 down 1.00000 1.00000 6 hdd 7.27699 osd.6 down 1.00000 1.00000 9 hdd 7.27699 osd.9 down 0 1.00000 12 hdd 7.27699 osd.12 down 1.00000 1.00000 15 hdd 7.27699 osd.15 up 1.00000 1.00000 18 hdd 7.27699 osd.18 down 1.00000 1.00000 21 hdd 7.27699 osd.21 down 1.00000 1.00000 24 hdd 7.27699 osd.24 up 1.00000 1.00000 27 hdd 7.27699 osd.27 down 1.00000 1.00000 30 hdd 7.27699 osd.30 down 1.00000 1.00000 35 hdd 7.27699 osd.35 down 1.00000 1.00000 37 hdd 7.27699 osd.37 down 1.00000 1.00000 40 hdd 7.27699 osd.40 down 1.00000 1.00000 44 hdd 7.27699 osd.44 down 1.00000 1.00000 47 hdd 7.27699 osd.47 up 1.00000 1.00000 50 hdd 7.27699 osd.50 up 1.00000 1.00000 53 hdd 7.27699 osd.53 down 1.00000 1.00000 56 hdd 7.27699 osd.56 down 1.00000 1.00000 59 hdd 7.27699 osd.59 up 1.00000 1.00000 62 hdd 7.27699 osd.62 down 0 1.00000 65 hdd 7.27699 osd.65 down 1.00000 1.00000 68 hdd 7.27699 osd.68 down 1.00000 1.00000 71 hdd 7.27699 osd.71 down 1.00000 1.00000 74 hdd 7.27699 osd.74 down 1.00000 1.00000 77 hdd 7.27699 osd.77 up 1.00000 1.00000 80 hdd 7.27699 osd.80 down 1.00000 1.00000 83 hdd 7.27699 osd.83 up 1.00000 1.00000 86 hdd 7.27699 osd.86 down 1.00000 1.00000 88 hdd 7.27699 osd.88 down 1.00000 1.00000 91 hdd 7.27699 osd.91 down 1.00000 1.00000 94 hdd 7.27699 osd.94 down 1.00000 1.00000 97 hdd 7.27699 osd.97 down 1.00000 1.00000 100 hdd 7.27699 osd.100 down 0 1.00000 103 hdd 7.27699 osd.103 down 1.00000 1.00000 106 hdd 7.27699 osd.106 up 1.00000 1.00000 -3 261.98599 host ceph002 0 hdd 7.27699 osd.0 down 0 1.00000 4 hdd 7.27699 osd.4 up 1.00000 1.00000 7 hdd 7.27699 osd.7 up 1.00000 1.00000 11 hdd 7.27699 osd.11 down 1.00000 1.00000 13 hdd 7.27699 osd.13 up 1.00000 1.00000 16 hdd 7.27699 osd.16 down 1.00000 1.00000 19 hdd 7.27699 osd.19 down 0 1.00000 23 hdd 7.27699 osd.23 up 1.00000 1.00000 26 hdd 7.27699 osd.26 down 0 1.00000 29 hdd 7.27699 osd.29 down 0 1.00000 32 hdd 7.27699 osd.32 down 0 1.00000 33 hdd 7.27699 osd.33 down 0 1.00000 36 hdd 7.27699 osd.36 down 0 1.00000 39 hdd 7.27699 osd.39 down 1.00000 1.00000 43 hdd 7.27699 osd.43 up 1.00000 1.00000 46 hdd 7.27699 osd.46 up 1.00000 1.00000 49 hdd 7.27699 osd.49 down 1.00000 1.00000 52 hdd 7.27699 osd.52 down 1.00000 1.00000 55 hdd 7.27699 osd.55 down 0 1.00000 58 hdd 7.27699 osd.58 up 1.00000 1.00000 61 hdd 7.27699 osd.61 down 1.00000 1.00000 64 hdd 7.27699 osd.64 down 1.00000 1.00000 67 hdd 7.27699 osd.67 up 1.00000 1.00000 70 hdd 7.27699 osd.70 down 1.00000 1.00000 73 hdd 7.27699 osd.73 down 1.00000 1.00000 76 hdd 7.27699 osd.76 up 1.00000 1.00000 78 hdd 7.27699 osd.78 down 1.00000 1.00000 81 hdd 7.27699 osd.81 down 1.00000 1.00000 84 hdd 7.27699 osd.84 down 0 1.00000 87 hdd 7.27699 osd.87 down 1.00000 1.00000 90 hdd 7.27699 osd.90 down 0 1.00000 93 hdd 7.27699 osd.93 down 1.00000 1.00000 96 hdd 7.27699 osd.96 down 0 1.00000 99 hdd 7.27699 osd.99 down 0 1.00000 102 hdd 7.27699 osd.102 down 0 1.00000 105 hdd 7.27699 osd.105 up 1.00000 1.00000 -7 261.98599 host ceph003 2 hdd 7.27699 osd.2 up 1.00000 1.00000 5 hdd 7.27699 osd.5 down 1.00000 1.00000 8 hdd 7.27699 osd.8 up 1.00000 1.00000 10 hdd 7.27699 osd.10 down 0 1.00000 14 hdd 7.27699 osd.14 down 0 1.00000 17 hdd 7.27699 osd.17 up 1.00000 1.00000 20 hdd 7.27699 osd.20 down 0 1.00000 22 hdd 7.27699 osd.22 down 0 1.00000 25 hdd 7.27699 osd.25 up 1.00000 1.00000 28 hdd 7.27699 osd.28 up 1.00000 1.00000 31 hdd 7.27699 osd.31 down 0 1.00000 34 hdd 7.27699 osd.34 down 0 1.00000 38 hdd 7.27699 osd.38 down 0 1.00000 41 hdd 7.27699 osd.41 down 1.00000 1.00000 42 hdd 7.27699 osd.42 down 0 1.00000 45 hdd 7.27699 osd.45 up 1.00000 1.00000 48 hdd 7.27699 osd.48 up 1.00000 1.00000 51 hdd 7.27699 osd.51 down 1.00000 1.00000 54 hdd 7.27699 osd.54 up 1.00000 1.00000 57 hdd 7.27699 osd.57 down 1.00000 1.00000 60 hdd 7.27699 osd.60 down 1.00000 1.00000 63 hdd 7.27699 osd.63 up 1.00000 1.00000 66 hdd 7.27699 osd.66 down 1.00000 1.00000 69 hdd 7.27699 osd.69 up 1.00000 1.00000 72 hdd 7.27699 osd.72 up 1.00000 1.00000 75 hdd 7.27699 osd.75 down 1.00000 1.00000 79 hdd 7.27699 osd.79 up 1.00000 1.00000 82 hdd 7.27699 osd.82 down 1.00000 1.00000 85 hdd 7.27699 osd.85 down 1.00000 1.00000 89 hdd 7.27699 osd.89 down 0 1.00000 92 hdd 7.27699 osd.92 down 1.00000 1.00000 95 hdd 7.27699 osd.95 down 0 1.00000 98 hdd 7.27699 osd.98 down 0 1.00000 101 hdd 7.27699 osd.101 down 1.00000 1.00000 104 hdd 7.27699 osd.104 down 0 1.00000 107 hdd 7.27699 osd.107 up 1.00000 1.00000 Ceph status; [root@ceph001 ~]# ceph status cluster: id: 54052e72-6835-410e-88a9-af4ac17a8113 health: HEALTH_WARN 1 filesystem is degraded 1 MDSs report slow metadata IOs 48 osds down Reduced data availability: 2053 pgs inactive, 2043 pgs down, 7 pgs peering, 3 pgs incomplete, 126 pgs stale Degraded data redundancy: 18473/27200783 objects degraded (0.068%), 106 pgs degraded, 103 pgs undersized too many PGs per OSD (258 > max 250) services: mon: 3 daemons, quorum filler001,filler002,bezavrdat-master01 mgr: bezavrdat-master01(active), standbys: filler002, filler001 mds: cephfs-1/1/1 up {0=filler002=up:replay}, 1 up:standby osd: 108 osds: 32 up, 80 in; 16 remapped pgs data: pools: 2 pools, 2176 pgs objects: 2.73 M objects, 1.7 TiB usage: 2.3 TiB used, 580 TiB / 582 TiB avail pgs: 94.347% pgs not active 18473/27200783 objects degraded (0.068%) 1951 down 79 active+undersized+degraded 76 stale+down 23 stale+active+undersized+degraded 14 down+remapped 14 stale+active+clean 6 stale+peering 3 active+clean 3 stale+active+recovery_wait+degraded 2 incomplete 2 stale+down+remapped 1 stale+incomplete 1 stale+remapped+peering 1 active+recovering+undersized+degraded+remapped Thank you in advance! Regards, [Atos logo] Andrea Del Monaco HPC Consultant – Big Data & Security M: +31 612031174 Burgemeester Rijnderslaan 30 – 1185 MC Amstelveen – The Netherlands atos.net<https://atos.net/> [LinkedIn icon]<https://www.linkedin.com/company/1259/> [Twitter icon] <https://twitter.com/atos> [Facebook icon] <https://www.facebook.com/Atos/> [Youtube icon] <https://www.youtube.com/user/Atos> This e-mail and the documents attached are confidential and intended solely for the addressee; it may also be privileged. If you receive this e-mail in error, please notify the sender immediately and destroy it. As its integrity cannot be secured on the Internet, Atos’ liability cannot be triggered for the message content. Although the sender endeavours to maintain a computer virus-free network, the sender does not warrant that this transmission is virus-free and will not be liable for any damages resulting from any virus transmitted. On all offers and agreements under which Atos Nederland B.V. supplies goods and/or services of whatever nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply. The Terms of Delivery shall be promptly submitted to you on your request.
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io