Re: [ceph-users] Dying OSDs

Brady Deetz Tue, 10 Apr 2018 06:30:53 -0700

What distribution and kernel are you running?

I recently found my cluster running the 3.10 centos kernel when I thought
it was running the elrepo kernel. After forcing it to boot correctly, my
flapping osd issue went away.


On Tue, Apr 10, 2018, 2:18 AM Jan Marquardt <j...@artfiles.de> wrote:

> Hi,
>
> we are experiencing massive problems with our Ceph setup. After starting
> a "repair pg" because of scrub errors OSDs started to crash, which we
> could not stop so far. We are running Ceph 12.2.4. Crashed OSDs are both
> bluestore and filestore.
>
> Our cluster currently looks like this:
>
> # ceph -s
>   cluster:
>     id:     c59e56df-2043-4c92-9492-25f05f268d9f
>     health: HEALTH_ERR
>             1 osds down
>             73005/17149710 objects misplaced (0.426%)
>             5 scrub errors
>             Reduced data availability: 2 pgs inactive, 2 pgs down
>             Possible data damage: 1 pg inconsistent
>             Degraded data redundancy: 611518/17149710 objects degraded
> (3.566%), 86 pgs degraded, 86 pgs undersized
>
>   services:
>     mon: 3 daemons, quorum head1,head2,head3
>     mgr: head3(active), standbys: head2, head1
>     osd: 34 osds: 24 up, 25 in; 18 remapped pgs
>
>   data:
>     pools:   1 pools, 768 pgs
>     objects: 5582k objects, 19500 GB
>     usage:   62030 GB used, 31426 GB / 93456 GB avail
>     pgs:     0.260% pgs not active
>              611518/17149710 objects degraded (3.566%)
>              73005/17149710 objects misplaced (0.426%)
>              670 active+clean
>              75  active+undersized+degraded
>              8   active+undersized+degraded+remapped+backfill_wait
>              8   active+clean+remapped
>              2   down
>              2   active+undersized+degraded+remapped+backfilling
>              2   active+clean+scrubbing+deep
>              1   active+undersized+degraded+inconsistent
>
>   io:
>     client:   10911 B/s rd, 118 kB/s wr, 0 op/s rd, 54 op/s wr
>     recovery: 31575 kB/s, 8 objects/s
>
> # ceph osd tree
> ID  CLASS WEIGHT    TYPE NAME      STATUS REWEIGHT PRI-AFF
>  -1       124.07297 root default
>  -2        29.08960     host ceph1
>   0   hdd   3.63620         osd.0      up  1.00000 1.00000
>   1   hdd   3.63620         osd.1    down        0 1.00000
>   2   hdd   3.63620         osd.2      up  1.00000 1.00000
>   3   hdd   3.63620         osd.3      up  1.00000 1.00000
>   4   hdd   3.63620         osd.4    down        0 1.00000
>   5   hdd   3.63620         osd.5    down        0 1.00000
>   6   hdd   3.63620         osd.6      up  1.00000 1.00000
>   7   hdd   3.63620         osd.7      up  1.00000 1.00000
>  -3         7.27240     host ceph2
>  14   hdd   3.63620         osd.14     up  1.00000 1.00000
>  15   hdd   3.63620         osd.15     up  1.00000 1.00000
>  -4        29.11258     host ceph3
>  16   hdd   3.63620         osd.16     up  1.00000 1.00000
>  18   hdd   3.63620         osd.18   down        0 1.00000
>  19   hdd   3.63620         osd.19   down        0 1.00000
>  20   hdd   3.65749         osd.20     up  1.00000 1.00000
>  21   hdd   3.63620         osd.21     up  1.00000 1.00000
>  22   hdd   3.63620         osd.22     up  1.00000 1.00000
>  23   hdd   3.63620         osd.23     up  1.00000 1.00000
>  24   hdd   3.63789         osd.24   down        0 1.00000
>  -9        29.29919     host ceph4
>  17   hdd   3.66240         osd.17     up  1.00000 1.00000
>  25   hdd   3.66240         osd.25     up  1.00000 1.00000
>  26   hdd   3.66240         osd.26   down        0 1.00000
>  27   hdd   3.66240         osd.27     up  1.00000 1.00000
>  28   hdd   3.66240         osd.28   down        0 1.00000
>  29   hdd   3.66240         osd.29     up  1.00000 1.00000
>  30   hdd   3.66240         osd.30     up  1.00000 1.00000
>  31   hdd   3.66240         osd.31   down        0 1.00000
> -11        29.29919     host ceph5
>  32   hdd   3.66240         osd.32     up  1.00000 1.00000
>  33   hdd   3.66240         osd.33     up  1.00000 1.00000
>  34   hdd   3.66240         osd.34     up  1.00000 1.00000
>  35   hdd   3.66240         osd.35     up  1.00000 1.00000
>  36   hdd   3.66240         osd.36   down  1.00000 1.00000
>  37   hdd   3.66240         osd.37     up  1.00000 1.00000
>  38   hdd   3.66240         osd.38     up  1.00000 1.00000
>  39   hdd   3.66240         osd.39     up  1.00000 1.00000
>
> The last OSDs that crashed are #28 and #36. Please find the
> corresponding log files here:
>
> http://af.janno.io/ceph/ceph-osd.28.log.1.gz
> http://af.janno.io/ceph/ceph-osd.36.log.1.gz
>
> The backtraces look almost the same for all crashed OSDs.
>
> Any help, hint or advice would really be appreciated. Please let me know
> if you need any further information.
>
> Best Regards
>
> Jan
>
> --
> Artfiles New Media GmbH | Zirkusweg 1 | 20359 Hamburg
> Tel: 040 - 32 02 72 90 | Fax: 040 - 32 02 72 95
> E-Mail: supp...@artfiles.de | Web: http://www.artfiles.de
> Geschäftsführer: Harald Oltmanns | Tim Evers
> Eingetragen im Handelsregister Hamburg - HRB 81478
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Dying OSDs

Reply via email to