Re: [ceph-users] problem with degraded PG

Caspar Smit Fri, 14 Jun 2019 03:02:43 -0700

Hi,

Something seems off in weight and sizes of hosts: ssdstor-a01, ssdstor-b01
and ssdstor-c01.


ssdstor-c01 has a weight of 0.4 while the size ~10TiB so i would expect the
weight to be around 10 in stead of 0.4
Same goes for the other two nodes i mentioned above.

Could you explain this?

Kind regards,
Caspar Smit



Op vr 14 jun. 2019 om 11:52 schreef Luk <ski...@tlen.pl>:

> Here is ceph osd tree, in first post there is also ceph osd df tree:
>
> https://pastebin.com/Vs75gpwZ
>
>
>
> > Ahh I was thinking of chooseleaf_vary_r, which you already have.
> > So probably not related to tunables. What is your `ceph osd tree` ?
>
> > By the way, 12.2.9 has an unrelated bug (details
> > http://tracker.ceph.com/issues/36686)
> > AFAIU you will just need to update to v12.2.11 or v12.2.12 for that fix.
>
> > -- Dan
>
> > On Fri, Jun 14, 2019 at 11:29 AM Luk <ski...@tlen.pl> wrote:
> >>
> >> Hi,
> >>
> >> here is the output:
> >>
> >> ceph osd crush show-tunables
> >> {
> >>     "choose_local_tries": 0,
> >>     "choose_local_fallback_tries": 0,
> >>     "choose_total_tries": 100,
> >>     "chooseleaf_descend_once": 1,
> >>     "chooseleaf_vary_r": 1,
> >>     "chooseleaf_stable": 0,
> >>     "straw_calc_version": 1,
> >>     "allowed_bucket_algs": 22,
> >>     "profile": "unknown",
> >>     "optimal_tunables": 0,
> >>     "legacy_tunables": 0,
> >>     "minimum_required_version": "hammer",
> >>     "require_feature_tunables": 1,
> >>     "require_feature_tunables2": 1,
> >>     "has_v2_rules": 0,
> >>     "require_feature_tunables3": 1,
> >>     "has_v3_rules": 0,
> >>     "has_v4_buckets": 1,
> >>     "require_feature_tunables5": 0,
> >>     "has_v5_rules": 0
> >> }
> >>
> >> [root@ceph-mon-01 ~]#
> >>
> >> --
> >> Regards
> >> Lukasz
> >>
> >> > Hi,
> >> > This looks like a tunables issue.
> >> > What is the output of `ceph osd crush show-tunables `
> >>
> >> > -- Dan
> >>
> >> > On Fri, Jun 14, 2019 at 11:19 AM Luk <ski...@tlen.pl> wrote:
> >> >>
> >> >> Hello,
> >> >>
> >> >> Maybe  somone  was  fighting  with this kind of stuck in ceph
> already.
> >> >> This  is  production  cluster,  can't/don't  want to make wrong
> steps,
> >> >> please advice, what to do.
> >> >>
> >> >> After  changing  of  one failed disk (it was osd-7) on our cluster
> ceph
> >> >> didn't recover to HEALTH_OK, it stopped in state:
> >> >>
> >> >> [root@ceph-mon-01 ~]# ceph -s
> >> >>   cluster:
> >> >>     id:     b6f23cff-7279-f4b0-ff91-21fadac95bb5
> >> >>     health: HEALTH_WARN
> >> >>             noout,noscrub,nodeep-scrub flag(s) set
> >> >>             Degraded data redundancy: 24761/45994899 objects
> degraded (0.054%), 8 pgs degraded, 8 pgs undersized
> >> >>
> >> >>   services:
> >> >>     mon:        3 daemons, quorum ceph-mon-01,ceph-mon-02,ceph-mon-03
> >> >>     mgr:        ceph-mon-03(active), standbys: ceph-mon-02,
> ceph-mon-01
> >> >>     osd:        144 osds: 144 up, 144 in
> >> >>                 flags noout,noscrub,nodeep-scrub
> >> >>     rbd-mirror: 3 daemons active
> >> >>     rgw:        6 daemons active
> >> >>
> >> >>   data:
> >> >>     pools:   18 pools, 2176 pgs
> >> >>     objects: 15.33M objects, 49.3TiB
> >> >>     usage:   151TiB used, 252TiB / 403TiB avail
> >> >>     pgs:     24761/45994899 objects degraded (0.054%)
> >> >>              2168 active+clean
> >> >>              8    active+undersized+degraded
> >> >>
> >> >>   io:
> >> >>     client:   435MiB/s rd, 415MiB/s wr, 7.94kop/s rd, 2.96kop/s wr
> >> >>
> >> >> Restart of OSD didn't helped, changing choose_total_tries from 50 to
> 100 didn't help.
> >> >>
> >> >> I checked one of degraded PG, 10.3c4
> >> >>
> >> >> [root@ceph-mon-01 ~]# ceph pg dump 2>&1 | grep -w 10.3c4
> >> >> 10.3c4     3593                  0     3593         0       0
> 14769891858 10076    10076    active+undersized+degraded 2019-06-13
> 08:19:39.802219  37380'71900564 37380:119411139       [9,109]          9
>    [9,109]              9  33550'69130424 2019-06-08 02:28:40.508790
> 33550'69130424 2019-06-08 02:28:40.508790            18
> >> >>
> >> >>
> >> >> [root@ceph-mon-01 ~]# ceph pg 10.3c4 query | jq '.["peer_info"][] |
> {peer: .peer, last_update:.last_update}'
> >> >> {
> >> >>   "peer": "0",
> >> >>   "last_update": "36847'71412720"
> >> >> }
> >> >> {
> >> >>   "peer": "109",
> >> >>   "last_update": "37380'71900570"
> >> >> }
> >> >> {
> >> >>   "peer": "117",
> >> >>   "last_update": "0'0"
> >> >> }
> >> >>
> >> >>
> >> >> [root@ceph-mon-01 ~]#
> >> >> I have checked space taken for this PG on storage nodes:
> >> >> here is how to check where is particular OSD (on which physical
> storage node):
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 9 "
> >> >> |  9  |   stor-a02  | 2063G | 5386G |   52   |  1347k  |   53   |
>  292k  | exists,up |
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 109 "
> >> >> | 109 |   stor-a01  | 1285G | 4301G |    5   |  31.0k  |    6   |
> 59.2k  | exists,up |
> >> >> [root@ceph-mon-01 ~]# watch ceph -s
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 117 "
> >> >> | 117 |   stor-b02  | 1334G | 4252G |   54   |  1216k  |   13   |
> 27.4k  | exists,up |
> >> >> [root@ceph-mon-01 ~]# ceph osd status 2>&1 | grep " 0 "
> >> >> |  0  |   stor-a01  | 2156G | 5293G |   58   |   387k  |   29   |
> 30.7k  | exists,up |
> >> >> [root@ceph-mon-01 ~]#
> >> >> and checking sizes on servers:
> >> >> stor-a01 (this PG shouldn't be on the same host):
> >> >> [root@stor-a01 /var/lib/ceph/osd/ceph-0/current]# du -sh 10.3c4_*
> >> >> 2.4G    10.3c4_head
> >> >> 0       10.3c4_TEMP
> >> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]# du -sh 10.3c4_*
> >> >> 14G     10.3c4_head
> >> >> 0       10.3c4_TEMP
> >> >> [root@stor-a01 /var/lib/ceph/osd/ceph-109/current]#
> >> >> stor-a02:
> >> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]# du -sh 10.3c4_*
> >> >> 14G     10.3c4_head
> >> >> 0       10.3c4_TEMP
> >> >> [root@stor-a02 /var/lib/ceph/osd/ceph-9/current]#
> >> >> stor-b02:
> >> >> [root@stor-b02 /var/lib/ceph/osd/ceph-117/current]# du -sh 10.3c4_*
> >> >> zsh: no matches found: 10.3c4_*
> >> >>
> >> >> information about ceph:
> >> >> [root@ceph-mon-01 ~]# ceph versions
> >> >> {
> >> >>     "mon": {
> >> >>         "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> >>     },
> >> >>     "mgr": {
> >> >>         "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> >>     },
> >> >>     "osd": {
> >> >>         "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 144
> >> >>     },
> >> >>     "mds": {},
> >> >>     "rbd-mirror": {
> >> >>         "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 3
> >> >>     },
> >> >>     "rgw": {
> >> >>         "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 6
> >> >>     },
> >> >>     "overall": {
> >> >>         "ceph version 12.2.9
> (9e300932ef8a8916fb3fda78c58691a6ab0f4217) luminous (stable)": 159
> >> >>     }
> >> >> }
> >> >>
> >> >> crushmap: https://pastebin.com/cpC2WmyS
> >> >> ceph osd tree: https://pastebin.com/XvZ2cNZZ
> >> >>
> >> >> I'm  cross-posting this do devel because maybe there is some known
> bug
> >> >> in  this  particular  version  of  ceph,  and  You  could  point
> some
> >> >> directions to fix this problem.
> >> >>
> >> >> --
> >> >> Regards
> >> >> Lukasz
> >> >>
> >>
> >>
> >>
> >> --
> >> Pozdrowienia,
> >>  Luk
> >>
>
>
>
> --
> Pozdrowienia,
>  Luk
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] problem with degraded PG

Reply via email to