On Tue, Oct 1, 2019 at 10:43 PM Del Monaco, Andrea <
andrea.delmon...@atos.net> wrote:

> Hi list,
>
> After the nodes ran OOM and after reboot, we are not able to restart the
> ceph-osd@x services anymore. (Details about the setup at the end).
>
> I am trying to do this manually, so we can see the error but all i see is
> several crash dumps - this is just one of the OSDs which is not starting.
> Any idea how to get past this??
> [root@ceph001 ~]# /usr/bin/ceph-osd --debug_osd 10 -f --cluster ceph --id
> 83 --setuser ceph --setgroup ceph  > /tmp/dump 2>&1
> starting osd.83 at - osd_data /var/lib/ceph/osd/ceph-83
> /var/lib/ceph/osd/ceph-83/journal
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)'
> thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> 34: FAILED assert(stripe_width % stripe_size == 0)
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14b) [0x2aaaaaf3d36b]
>  2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  8: (OSD::init()+0xc99) [0x5555559238e9]
>  9: (main()+0x23a3) [0x5555558017a3]
>  10: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  11: (()+0x385900) [0x5555558d9900]
> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)'
> thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> 34: FAILED assert(stripe_width % stripe_size == 0)
>

 https://tracker.ceph.com/issues/41336 may be relevant here.

Can you post details of the pool involved as well as the erasure code
profile in use for that pool?


>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14b) [0x2aaaaaf3d36b]
>  2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  8: (OSD::init()+0xc99) [0x5555559238e9]
>  9: (main()+0x23a3) [0x5555558017a3]
>  10: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  11: (()+0x385900) [0x5555558d9900]
>
> *** Caught signal (Aborted) **
>  in thread 2aaaaaaf5540 thread_name:ceph-osd
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (()+0xf5d0) [0x2aaab69765d0]
>  2: (gsignal()+0x37) [0x2aaab77f22c7]
>  3: (abort()+0x148) [0x2aaab77f39b8]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x248) [0x2aaaaaf3d468]
>  5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  11: (OSD::init()+0xc99) [0x5555559238e9]
>  12: (main()+0x23a3) [0x5555558017a3]
>  13: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  14: (()+0x385900) [0x5555558d9900]
> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal (Aborted) **
>  in thread 2aaaaaaf5540 thread_name:ceph-osd
>
>
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (()+0xf5d0) [0x2aaab69765d0]
>  2: (gsignal()+0x37) [0x2aaab77f22c7]
>  3: (abort()+0x148) [0x2aaab77f39b8]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x248) [0x2aaaaaf3d468]
>  5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  11: (OSD::init()+0xc99) [0x5555559238e9]
>  12: (main()+0x23a3) [0x5555558017a3]
>  13: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  14: (()+0x385900) [0x5555558d9900]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>   -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/
> x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)'
> thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> 34: FAILED assert(stripe_width % stripe_size == 0)
>
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14b) [0x2aaaaaf3d36b]
>  2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  8: (OSD::init()+0xc99) [0x5555559238e9]
>  9: (main()+0x23a3) [0x5555558017a3]
>  10: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  11: (()+0x385900) [0x5555558d9900]
>
>   -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal
> (Aborted) **
>  in thread 2aaaaaaf5540 thread_name:ceph-osd
>
> ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (()+0xf5d0) [0x2aaab69765d0]
>  2: (gsignal()+0x37) [0x2aaab77f22c7]
>  3: (abort()+0x148) [0x2aaab77f39b8]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x248) [0x2aaaaaf3d468]
>  5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  11: (OSD::init()+0xc99) [0x5555559238e9]
>  12: (main()+0x23a3) [0x5555558017a3]
>  13: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  14: (()+0x385900) [0x5555558d9900]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>   -693> 2019-10-01 14:19:49.500 2aaaaaaf5540 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)'
> thread 2aaaaaaf5540 time 2019-10-01 14:19:49.494368
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/13.2.6/rpm/el7/BUILD/ceph-13.2.6/src/osd/ECUtil.h:
> 34: FAILED assert(stripe_width % stripe_size == 0)
>
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14b) [0x2aaaaaf3d36b]
>  2: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  3: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  4: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  5: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  6: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  7: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  8: (OSD::init()+0xc99) [0x5555559238e9]
>  9: (main()+0x23a3) [0x5555558017a3]
>  10: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  11: (()+0x385900) [0x5555558d9900]
>
>   -693> 2019-10-01 14:19:49.509 2aaaaaaf5540 -1 *** Caught signal
> (Aborted) **
>  in thread 2aaaaaaf5540 thread_name:ceph-osd
>
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (()+0xf5d0) [0x2aaab69765d0]
>  2: (gsignal()+0x37) [0x2aaab77f22c7]
>  3: (abort()+0x148) [0x2aaab77f39b8]
>  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x248) [0x2aaaaaf3d468]
>  5: (()+0x26e4f7) [0x2aaaaaf3d4f7]
>  6: (ECBackend::ECBackend(PGBackend::Listener*, coll_t const&,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*, std::shared_ptr<ceph::ErasureCodeInterface>, unsigned
> long)+0x46d) [0x555555c0bd3d]
>  7: (PGBackend::build_pg_backend(pg_pool_t const&, std::map<std::string,
> std::string, std::less<std::string>, std::allocator<std::pair<std::string
> const, std::string> > > const&, PGBackend::Listener*, coll_t,
> boost::intrusive_ptr<ObjectStore::CollectionImpl>&, ObjectStore*,
> CephContext*)+0x30a) [0x555555b0ba8a]
>  8: (PrimaryLogPG::PrimaryLogPG(OSDService*, std::shared_ptr<OSDMap
> const>, PGPool const&, std::map<std::string, std::string,
> std::less<std::string>, std::allocator<std::pair<std::string const,
> std::string> > > const&, spg_t)+0x140) [0x555555abd100]
>  9: (OSD::_make_pg(std::shared_ptr<OSDMap const>, spg_t)+0x10cb)
> [0x555555914ecb]
>  10: (OSD::load_pgs()+0x4a9) [0x555555917e39]
>  11: (OSD::init()+0xc99) [0x5555559238e9]
>  12: (main()+0x23a3) [0x5555558017a3]
>  13: (__libc_start_main()+0xf5) [0x2aaab77de495]
>  14: (()+0x385900) [0x5555558d9900]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> Environment:
> [root@ceph001 ~]# uname -r
> 3.10.0-957.27.2.el7.x86_64
> [root@ceph001 ~]# cat /etc/redhat-release
> CentOS Linux release 7.6.1810 (Core)
> [root@ceph001 ~]# rpm -qa | grep -i ceph
> cm-config-ceph-release-mimic-8.2-73_cm8.2.noarch
> ceph-13.2.6-0.el7.x86_64
> ceph-selinux-13.2.6-0.el7.x86_64
> ceph-base-13.2.6-0.el7.x86_64
> ceph-osd-13.2.6-0.el7.x86_64
> cm-config-ceph-radosgw-systemd-8.2-6_cm8.2.noarch
> libcephfs2-13.2.6-0.el7.x86_64
> ceph-common-13.2.6-0.el7.x86_64
> ceph-mgr-13.2.6-0.el7.x86_64
> cm-config-ceph-systemd-8.2-12_cm8.2.noarch
> ceph-mon-13.2.6-0.el7.x86_64
> python-cephfs-13.2.6-0.el7.x86_64
> ceph-mds-13.2.6-0.el7.x86_64
>
> ceph osd tree:
> ID  CLASS WEIGHT    TYPE NAME        STATUS REWEIGHT PRI-AFF
>  -1       785.95801 root default
>  -5       261.98599     host ceph001
>   1   hdd   7.27699         osd.1        up  1.00000 1.00000
>   3   hdd   7.27699         osd.3      down  1.00000 1.00000
>   6   hdd   7.27699         osd.6      down  1.00000 1.00000
>   9   hdd   7.27699         osd.9      down        0 1.00000
>  12   hdd   7.27699         osd.12     down  1.00000 1.00000
>  15   hdd   7.27699         osd.15       up  1.00000 1.00000
>  18   hdd   7.27699         osd.18     down  1.00000 1.00000
>  21   hdd   7.27699         osd.21     down  1.00000 1.00000
>  24   hdd   7.27699         osd.24       up  1.00000 1.00000
>  27   hdd   7.27699         osd.27     down  1.00000 1.00000
>  30   hdd   7.27699         osd.30     down  1.00000 1.00000
>  35   hdd   7.27699         osd.35     down  1.00000 1.00000
>  37   hdd   7.27699         osd.37     down  1.00000 1.00000
>  40   hdd   7.27699         osd.40     down  1.00000 1.00000
>  44   hdd   7.27699         osd.44     down  1.00000 1.00000
>  47   hdd   7.27699         osd.47       up  1.00000 1.00000
>  50   hdd   7.27699         osd.50       up  1.00000 1.00000
>  53   hdd   7.27699         osd.53     down  1.00000 1.00000
>  56   hdd   7.27699         osd.56     down  1.00000 1.00000
>  59   hdd   7.27699         osd.59       up  1.00000 1.00000
>  62   hdd   7.27699         osd.62     down        0 1.00000
>  65   hdd   7.27699         osd.65     down  1.00000 1.00000
>  68   hdd   7.27699         osd.68     down  1.00000 1.00000
>  71   hdd   7.27699         osd.71     down  1.00000 1.00000
>  74   hdd   7.27699         osd.74     down  1.00000 1.00000
>  77   hdd   7.27699         osd.77       up  1.00000 1.00000
>  80   hdd   7.27699         osd.80     down  1.00000 1.00000
>  83   hdd   7.27699         osd.83       up  1.00000 1.00000
>  86   hdd   7.27699         osd.86     down  1.00000 1.00000
>  88   hdd   7.27699         osd.88     down  1.00000 1.00000
>  91   hdd   7.27699         osd.91     down  1.00000 1.00000
>  94   hdd   7.27699         osd.94     down  1.00000 1.00000
>  97   hdd   7.27699         osd.97     down  1.00000 1.00000
> 100   hdd   7.27699         osd.100    down        0 1.00000
> 103   hdd   7.27699         osd.103    down  1.00000 1.00000
> 106   hdd   7.27699         osd.106      up  1.00000 1.00000
>  -3       261.98599     host ceph002
>   0   hdd   7.27699         osd.0      down        0 1.00000
>   4   hdd   7.27699         osd.4        up  1.00000 1.00000
>   7   hdd   7.27699         osd.7        up  1.00000 1.00000
>  11   hdd   7.27699         osd.11     down  1.00000 1.00000
>  13   hdd   7.27699         osd.13       up  1.00000 1.00000
>  16   hdd   7.27699         osd.16     down  1.00000 1.00000
>  19   hdd   7.27699         osd.19     down        0 1.00000
>  23   hdd   7.27699         osd.23       up  1.00000 1.00000
>  26   hdd   7.27699         osd.26     down        0 1.00000
>  29   hdd   7.27699         osd.29     down        0 1.00000
>  32   hdd   7.27699         osd.32     down        0 1.00000
>  33   hdd   7.27699         osd.33     down        0 1.00000
>  36   hdd   7.27699         osd.36     down        0 1.00000
>  39   hdd   7.27699         osd.39     down  1.00000 1.00000
>  43   hdd   7.27699         osd.43       up  1.00000 1.00000
>  46   hdd   7.27699         osd.46       up  1.00000 1.00000
>  49   hdd   7.27699         osd.49     down  1.00000 1.00000
>  52   hdd   7.27699         osd.52     down  1.00000 1.00000
>  55   hdd   7.27699         osd.55     down        0 1.00000
>  58   hdd   7.27699         osd.58       up  1.00000 1.00000
>  61   hdd   7.27699         osd.61     down  1.00000 1.00000
>  64   hdd   7.27699         osd.64     down  1.00000 1.00000
>  67   hdd   7.27699         osd.67       up  1.00000 1.00000
>  70   hdd   7.27699         osd.70     down  1.00000 1.00000
>  73   hdd   7.27699         osd.73     down  1.00000 1.00000
>  76   hdd   7.27699         osd.76       up  1.00000 1.00000
>  78   hdd   7.27699         osd.78     down  1.00000 1.00000
>  81   hdd   7.27699         osd.81     down  1.00000 1.00000
>  84   hdd   7.27699         osd.84     down        0 1.00000
>  87   hdd   7.27699         osd.87     down  1.00000 1.00000
>  90   hdd   7.27699         osd.90     down        0 1.00000
>  93   hdd   7.27699         osd.93     down  1.00000 1.00000
>  96   hdd   7.27699         osd.96     down        0 1.00000
>  99   hdd   7.27699         osd.99     down        0 1.00000
> 102   hdd   7.27699         osd.102    down        0 1.00000
> 105   hdd   7.27699         osd.105      up  1.00000 1.00000
>  -7       261.98599     host ceph003
>   2   hdd   7.27699         osd.2        up  1.00000 1.00000
>   5   hdd   7.27699         osd.5      down  1.00000 1.00000
>   8   hdd   7.27699         osd.8        up  1.00000 1.00000
>  10   hdd   7.27699         osd.10     down        0 1.00000
>  14   hdd   7.27699         osd.14     down        0 1.00000
>  17   hdd   7.27699         osd.17       up  1.00000 1.00000
>  20   hdd   7.27699         osd.20     down        0 1.00000
>  22   hdd   7.27699         osd.22     down        0 1.00000
>  25   hdd   7.27699         osd.25       up  1.00000 1.00000
>  28   hdd   7.27699         osd.28       up  1.00000 1.00000
>  31   hdd   7.27699         osd.31     down        0 1.00000
>  34   hdd   7.27699         osd.34     down        0 1.00000
>  38   hdd   7.27699         osd.38     down        0 1.00000
>  41   hdd   7.27699         osd.41     down  1.00000 1.00000
>  42   hdd   7.27699         osd.42     down        0 1.00000
>  45   hdd   7.27699         osd.45       up  1.00000 1.00000
>  48   hdd   7.27699         osd.48       up  1.00000 1.00000
>  51   hdd   7.27699         osd.51     down  1.00000 1.00000
>  54   hdd   7.27699         osd.54       up  1.00000 1.00000
>  57   hdd   7.27699         osd.57     down  1.00000 1.00000
>  60   hdd   7.27699         osd.60     down  1.00000 1.00000
>  63   hdd   7.27699         osd.63       up  1.00000 1.00000
>  66   hdd   7.27699         osd.66     down  1.00000 1.00000
>  69   hdd   7.27699         osd.69       up  1.00000 1.00000
>  72   hdd   7.27699         osd.72       up  1.00000 1.00000
>  75   hdd   7.27699         osd.75     down  1.00000 1.00000
>  79   hdd   7.27699         osd.79       up  1.00000 1.00000
>  82   hdd   7.27699         osd.82     down  1.00000 1.00000
>  85   hdd   7.27699         osd.85     down  1.00000 1.00000
>  89   hdd   7.27699         osd.89     down        0 1.00000
>  92   hdd   7.27699         osd.92     down  1.00000 1.00000
>  95   hdd   7.27699         osd.95     down        0 1.00000
>  98   hdd   7.27699         osd.98     down        0 1.00000
> 101   hdd   7.27699         osd.101    down  1.00000 1.00000
> 104   hdd   7.27699         osd.104    down        0 1.00000
> 107   hdd   7.27699         osd.107      up  1.00000 1.00000
>
> Ceph status;
> [root@ceph001 ~]# ceph status
>   cluster:
>     id:     54052e72-6835-410e-88a9-af4ac17a8113
>     health: HEALTH_WARN
>             1 filesystem is degraded
>             1 MDSs report slow metadata IOs
>             48 osds down
>             Reduced data availability: 2053 pgs inactive, 2043 pgs down, 7
> pgs peering, 3 pgs incomplete, 126 pgs stale
>             Degraded data redundancy: 18473/27200783 objects degraded
> (0.068%), 106 pgs degraded, 103 pgs undersized
>             too many PGs per OSD (258 > max 250)
>
>   services:
>     mon: 3 daemons, quorum filler001,filler002,bezavrdat-master01
>     mgr: bezavrdat-master01(active), standbys: filler002, filler001
>     mds: cephfs-1/1/1 up  {0=filler002=up:replay}, 1 up:standby
>     osd: 108 osds: 32 up, 80 in; 16 remapped pgs
>
>   data:
>     pools:   2 pools, 2176 pgs
>     objects: 2.73 M objects, 1.7 TiB
>     usage:   2.3 TiB used, 580 TiB / 582 TiB avail
>     pgs:     94.347% pgs not active
>              18473/27200783 objects degraded (0.068%)
>              1951 down
>              79   active+undersized+degraded
>              76   stale+down
>              23   stale+active+undersized+degraded
>              14   down+remapped
>              14   stale+active+clean
>              6    stale+peering
>              3    active+clean
>              3    stale+active+recovery_wait+degraded
>              2    incomplete
>              2    stale+down+remapped
>              1    stale+incomplete
>              1    stale+remapped+peering
>              1    active+recovering+undersized+degraded+remapped
>
> Thank you in advance!
>
> Regards,
>
> [image: Atos logo]
>
> *Andrea Del Monaco*
> HPC Consultant – Big Data & Security
> M: +31 612031174
> Burgemeester Rijnderslaan 30 – 1185 MC Amstelveen – The Netherlands
> atos.net
> [image: LinkedIn icon] <https://www.linkedin.com/company/1259/> [image:
> Twitter icon] <https://twitter.com/atos> [image: Facebook icon]
> <https://www.facebook.com/Atos/> [image: Youtube icon]
> <https://www.youtube.com/user/Atos>
> This e-mail and the documents attached are confidential and intended
> solely for the addressee; it may also be privileged. If you receive this
> e-mail in error, please notify the sender immediately and destroy it. As
> its integrity cannot be secured on the Internet, Atos’ liability cannot be
> triggered for the message content. Although the sender endeavours to
> maintain a computer virus-free network, the sender does not warrant that
> this transmission is virus-free and will not be liable for any damages
> resulting from any virus transmitted. On all offers and agreements under
> which Atos Nederland B.V. supplies goods and/or services of whatever
> nature, the Terms of Delivery from Atos Nederland B.V. exclusively apply.
> The Terms of Delivery shall be promptly submitted to you on your request.
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to