Vasiliy,

I don't think that's the cause. Can you paste other tuning options from
your ceph.conf?

Also, have you fixed the problems with cephx auth?

Bob

On Mon, Nov 30, 2015 at 12:56 AM, Vasiliy Angapov <anga...@gmail.com> wrote:

> Btw, in my configuration "mon osd downout subtree limit" is set to "host".
> Does it influence things?
>
> 2015-11-29 14:38 GMT+08:00 Vasiliy Angapov <anga...@gmail.com>:
> > Bob,
> > Thanks for explanation, sounds resonable! But how it could happen that
> > host is down and its OSDs are still IN cluster?
> > I mean NOOUT flag is not set and my timeouts are fully default...
> >
> > But if I remember correctly host was not completely down, it was
> > pingable but not other services were reachable like SSH or any others.
> > Is it possible that OSDs were still sending some information to
> > monitors making them look like IN?
> >
> > 2015-11-29 2:10 GMT+08:00 Bob R <b...@drinksbeer.org>:
> >> Vasiliy,
> >>
> >> Your OSDs are marked as 'down' but 'in'.
> >>
> >> "Ceph OSDs have two known states that can be combined. Up and Down only
> >> tells you whether the OSD is actively involved in the cluster. OSD
> states
> >> also are expressed in terms of cluster replication: In and Out. Only
> when a
> >> Ceph OSD is tagged as Out does the self-healing process occur"
> >>
> >> Bob
> >>
> >> On Fri, Nov 27, 2015 at 6:15 AM, Mart van Santen <m...@greenhost.nl>
> wrote:
> >>>
> >>>
> >>> Dear Vasilily,
> >>>
> >>>
> >>>
> >>> On 11/27/2015 02:00 PM, Irek Fasikhov wrote:
> >>>
> >>> You have time to synchronize?
> >>>
> >>> С уважением, Фасихов Ирек Нургаязович
> >>> Моб.: +79229045757
> >>>
> >>> 2015-11-27 15:57 GMT+03:00 Vasiliy Angapov <anga...@gmail.com>:
> >>>>
> >>>> > It seams that you played around with crushmap, and done something
> >>>> > wrong.
> >>>> > Compare the look of 'ceph osd tree' and crushmap. There are some
> 'osd'
> >>>> > devices renamed to 'device' think threre is you problem.
> >>>> Is this a mistake actually? What I did is removed a bunch of OSDs from
> >>>> my cluster that's why the numeration is sparse. But is it an issue to
> >>>> a have a sparse numeration of OSDs?
> >>>
> >>>
> >>> I think this is normal and should be no problem. I had this also
> >>> previously.
> >>>
> >>>>
> >>>> > Hi.
> >>>> > Vasiliy, Yes it is a problem with crusmap. Look at height:
> >>>> > -3 14.56000     host slpeah001
> >>>> > -2 14.56000     host slpeah002
> >>>> What exactly is wrong here?
> >>>
> >>>
> >>> I do not know how the weight of the hosts contribute to determine were
> to
> >>> store the 3-th copy of the PG. As you explained, you have enough space
> on
> >>> all hosts, but maybe if the weights of the hosts do not count up and
> the
> >>> crushmap maybe come to the conclusion it is not able to place the PGs.
> What
> >>> you can try, is to artificially raise the weights of these hosts, to
> see if
> >>> it starts mapping the thirth copies for the pg's onto the available
> host.
> >>>
> >>> I had a similiar problem in the past, this was solved by upgrading to
> the
> >>> latest crush tunables. But be aware, that can create massive
> datamovement
> >>> behavior.
> >>>
> >>>
> >>>>
> >>>> I also found out that my OSD logs are full of such records:
> >>>> 2015-11-26 08:31:19.273268 7fe4f49b1700  0 cephx: verify_authorizer
> >>>> could not get service secret for service osd secret_id=2924
> >>>> 2015-11-26 08:31:19.273276 7fe4f49b1700  0 --
> >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
> pipe(0x41fd1000
> >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a520).accept: got bad
> >>>> authorizer
> >>>> 2015-11-26 08:31:24.273207 7fe4f49b1700  0 auth: could not find
> >>>> secret_id=2924
> >>>> 2015-11-26 08:31:24.273225 7fe4f49b1700  0 cephx: verify_authorizer
> >>>> could not get service secret for service osd secret_id=2924
> >>>> 2015-11-26 08:31:24.273231 7fe4f49b1700  0 --
> >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
> pipe(0x3f90b000
> >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a3c0).accept: got bad
> >>>> authorizer
> >>>> 2015-11-26 08:31:29.273199 7fe4f49b1700  0 auth: could not find
> >>>> secret_id=2924
> >>>> 2015-11-26 08:31:29.273215 7fe4f49b1700  0 cephx: verify_authorizer
> >>>> could not get service secret for service osd secret_id=2924
> >>>> 2015-11-26 08:31:29.273222 7fe4f49b1700  0 --
> >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
> pipe(0x41fd1000
> >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a260).accept: got bad
> >>>> authorizer
> >>>> 2015-11-26 08:31:34.273469 7fe4f49b1700  0 auth: could not find
> >>>> secret_id=2924
> >>>> 2015-11-26 08:31:34.273482 7fe4f49b1700  0 cephx: verify_authorizer
> >>>> could not get service secret for service osd secret_id=2924
> >>>> 2015-11-26 08:31:34.273486 7fe4f49b1700  0 --
> >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
> pipe(0x3f90b000
> >>>> sd=79 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee1a100).accept: got bad
> >>>> authorizer
> >>>> 2015-11-26 08:31:39.273310 7fe4f49b1700  0 auth: could not find
> >>>> secret_id=2924
> >>>> 2015-11-26 08:31:39.273331 7fe4f49b1700  0 cephx: verify_authorizer
> >>>> could not get service secret for service osd secret_id=2924
> >>>> 2015-11-26 08:31:39.273342 7fe4f49b1700  0 --
> >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
> pipe(0x41fcc000
> >>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19fa0).accept: got bad
> >>>> authorizer
> >>>> 2015-11-26 08:31:44.273753 7fe4f49b1700  0 auth: could not find
> >>>> secret_id=2924
> >>>> 2015-11-26 08:31:44.273769 7fe4f49b1700  0 cephx: verify_authorizer
> >>>> could not get service secret for service osd secret_id=2924
> >>>> 2015-11-26 08:31:44.273776 7fe4f49b1700  0 --
> >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
> pipe(0x41fcc000
> >>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee189a0).accept: got bad
> >>>> authorizer
> >>>> 2015-11-26 08:31:49.273412 7fe4f49b1700  0 auth: could not find
> >>>> secret_id=2924
> >>>> 2015-11-26 08:31:49.273431 7fe4f49b1700  0 cephx: verify_authorizer
> >>>> could not get service secret for service osd secret_id=2924
> >>>> 2015-11-26 08:31:49.273455 7fe4f49b1700  0 --
> >>>> 192.168.254.18:6816/110740 >> 192.168.254.12:0/1011754
> pipe(0x41fd1000
> >>>> sd=98 :6816 s=0 pgs=0 cs=0 l=1 c=0x3ee19080).accept: got bad
> >>>> authorizer
> >>>> 2015-11-26 08:31:54.273293 7fe4f49b1700  0 auth: could not find
> >>>> secret_id=2924
> >>>>
> >>>> What does it mean? Google sais it might be a time sync issue, but my
> >>>> clocks are perfectly synchronized...
> >>>
> >>>
> >>> Normally you get an error warning in "ceph status" if time is out of
> sync.
> >>> Nevertheless, you can try to restart the OSD's. I had issues with
> timing in
> >>> the past and discovered it sometime helps to restart the daemons
> *after*
> >>> syncing the times, before the accepted the new timings. But this was
> mostly
> >>> the case with monitors though.
> >>>
> >>>
> >>>
> >>> Regards,
> >>>
> >>>
> >>> Mart
> >>>
> >>>
> >>>
> >>>
> >>>>
> >>>> 2015-11-26 21:05 GMT+08:00 Irek Fasikhov <malm...@gmail.com>:
> >>>> > Hi.
> >>>> > Vasiliy, Yes it is a problem with crusmap. Look at height:
> >>>> > " -3 14.56000     host slpeah001
> >>>> >  -2 14.56000     host slpeah002
> >>>> >  "
> >>>> >
> >>>> > С уважением, Фасихов Ирек Нургаязович
> >>>> > Моб.: +79229045757
> >>>> >
> >>>> > 2015-11-26 13:16 GMT+03:00 ЦИТ РТ-Курамшин Камиль Фидаилевич
> >>>> > <kamil.kurams...@tatar.ru>:
> >>>> >>
> >>>> >> It seams that you played around with crushmap, and done something
> >>>> >> wrong.
> >>>> >> Compare the look of 'ceph osd tree' and crushmap. There are some
> 'osd'
> >>>> >> devices renamed to 'device' think threre is you problem.
> >>>> >>
> >>>> >> Отправлено с мобильного устройства.
> >>>> >>
> >>>> >>
> >>>> >> -----Original Message-----
> >>>> >> From: Vasiliy Angapov <anga...@gmail.com>
> >>>> >> To: ceph-users <ceph-users@lists.ceph.com>
> >>>> >> Sent: чт, 26 нояб. 2015 7:53
> >>>> >> Subject: [ceph-users] Undersized pgs problem
> >>>> >>
> >>>> >> Hi, colleagues!
> >>>> >>
> >>>> >> I have small 4-node CEPH cluster (0.94.2), all pools have size 3,
> >>>> >> min_size
> >>>> >> 1.
> >>>> >> This night one host failed and cluster was unable to rebalance
> saying
> >>>> >> there are a lot of undersized pgs.
> >>>> >>
> >>>> >> root@slpeah002:[~]:# ceph -s
> >>>> >>     cluster 78eef61a-3e9c-447c-a3ec-ce84c617d728
> >>>> >>      health HEALTH_WARN
> >>>> >>             1486 pgs degraded
> >>>> >>             1486 pgs stuck degraded
> >>>> >>             2257 pgs stuck unclean
> >>>> >>             1486 pgs stuck undersized
> >>>> >>             1486 pgs undersized
> >>>> >>             recovery 80429/555185 objects degraded (14.487%)
> >>>> >>             recovery 40079/555185 objects misplaced (7.219%)
> >>>> >>             4/20 in osds are down
> >>>> >>             1 mons down, quorum 1,2 slpeah002,slpeah007
> >>>> >>      monmap e7: 3 mons at
> >>>> >>
> >>>> >>
> >>>> >> {slpeah001=
> 192.168.254.11:6780/0,slpeah002=192.168.254.12:6780/0,slpeah007=172.31.252.46:6789/0
> }
> >>>> >>             election epoch 710, quorum 1,2 slpeah002,slpeah007
> >>>> >>      osdmap e14062: 20 osds: 16 up, 20 in; 771 remapped pgs
> >>>> >>       pgmap v7021316: 4160 pgs, 5 pools, 1045 GB data, 180 kobjects
> >>>> >>             3366 GB used, 93471 GB / 96838 GB avail
> >>>> >>             80429/555185 objects degraded (14.487%)
> >>>> >>             40079/555185 objects misplaced (7.219%)
> >>>> >>                 1903 active+clean
> >>>> >>                 1486 active+undersized+degraded
> >>>> >>                  771 active+remapped
> >>>> >>   client io 0 B/s rd, 246 kB/s wr, 67 op/s
> >>>> >>
> >>>> >>   root@slpeah002:[~]:# ceph osd tree
> >>>> >> ID  WEIGHT   TYPE NAME          UP/DOWN REWEIGHT PRIMARY-AFFINITY
> >>>> >>  -1 94.63998 root default
> >>>> >>  -9 32.75999     host slpeah007
> >>>> >>  72  5.45999         osd.72          up  1.00000          1.00000
> >>>> >>  73  5.45999         osd.73          up  1.00000          1.00000
> >>>> >>  74  5.45999         osd.74          up  1.00000          1.00000
> >>>> >>  75  5.45999         osd.75          up  1.00000          1.00000
> >>>> >>  76  5.45999         osd.76          up  1.00000          1.00000
> >>>> >>  77  5.45999         osd.77          up  1.00000          1.00000
> >>>> >> -10 32.75999     host slpeah008
> >>>> >>  78  5.45999         osd.78          up  1.00000          1.00000
> >>>> >>  79  5.45999         osd.79          up  1.00000          1.00000
> >>>> >>  80  5.45999         osd.80          up  1.00000          1.00000
> >>>> >>  81  5.45999         osd.81          up  1.00000          1.00000
> >>>> >>  82  5.45999         osd.82          up  1.00000          1.00000
> >>>> >>  83  5.45999         osd.83          up  1.00000          1.00000
> >>>> >>  -3 14.56000     host slpeah001
> >>>> >>   1  3.64000          osd.1         down  1.00000          1.00000
> >>>> >>  33  3.64000         osd.33        down  1.00000          1.00000
> >>>> >>  34  3.64000         osd.34        down  1.00000          1.00000
> >>>> >>  35  3.64000         osd.35        down  1.00000          1.00000
> >>>> >>  -2 14.56000     host slpeah002
> >>>> >>   0  3.64000         osd.0           up  1.00000          1.00000
> >>>> >>  36  3.64000         osd.36          up  1.00000          1.00000
> >>>> >>  37  3.64000         osd.37          up  1.00000          1.00000
> >>>> >>  38  3.64000         osd.38          up  1.00000          1.00000
> >>>> >>
> >>>> >> Crushmap:
> >>>> >>
> >>>> >>  # begin crush map
> >>>> >> tunable choose_local_tries 0
> >>>> >> tunable choose_local_fallback_tries 0
> >>>> >> tunable choose_total_tries 50
> >>>> >> tunable chooseleaf_descend_once 1
> >>>> >> tunable chooseleaf_vary_r 1
> >>>> >> tunable straw_calc_version 1
> >>>> >> tunable allowed_bucket_algs 54
> >>>> >>
> >>>> >> # devices
> >>>> >> device 0 osd.0
> >>>> >> device 1 osd.1
> >>>> >> device 2 device2
> >>>> >> device 3 device3
> >>>> >> device 4 device4
> >>>> >> device 5 device5
> >>>> >> device 6 device6
> >>>> >> device 7 device7
> >>>> >> device 8 device8
> >>>> >> device 9 device9
> >>>> >> device 10 device10
> >>>> >> device 11 device11
> >>>> >> device 12 device12
> >>>> >> device 13 device13
> >>>> >> device 14 device14
> >>>> >> device 15 device15
> >>>> >> device 16 device16
> >>>> >> device 17 device17
> >>>> >> device 18 device18
> >>>> >> device 19 device19
> >>>> >> device 20 device20
> >>>> >> device 21 device21
> >>>> >> device 22 device22
> >>>> >> device 23 device23
> >>>> >> device 24 device24
> >>>> >> device 25 device25
> >>>> >> device 26 device26
> >>>> >> device 27 device27
> >>>> >> device 28 device28
> >>>> >> device 29 device29
> >>>> >> device 30 device30
> >>>> >> device 31 device31
> >>>> >> device 32 device32
> >>>> >> device 33 osd.33
> >>>> >> device 34 osd.34
> >>>> >> device 35 osd.35
> >>>> >> device 36 osd.36
> >>>> >> device 37 osd.37
> >>>> >> device 38 osd.38
> >>>> >> device 39 device39
> >>>> >> device 40 device40
> >>>> >> device 41 device41
> >>>> >> device 42 device42
> >>>> >> device 43 device43
> >>>> >> device 44 device44
> >>>> >> device 45 device45
> >>>> >> device 46 device46
> >>>> >> device 47 device47
> >>>> >> device 48 device48
> >>>> >> device 49 device49
> >>>> >> device 50 device50
> >>>> >> device 51 device51
> >>>> >> device 52 device52
> >>>> >> device 53 device53
> >>>> >> device 54 device54
> >>>> >> device 55 device55
> >>>> >> device 56 device56
> >>>> >> device 57 device57
> >>>> >> device 58 device58
> >>>> >> device 59 device59
> >>>> >> device 60 device60
> >>>> >> device 61 device61
> >>>> >> device 62 device62
> >>>> >> device 63 device63
> >>>> >> device 64 device64
> >>>> >> device 65 device65
> >>>> >> device 66 device66
> >>>> >> device 67 device67
> >>>> >> device 68 device68
> >>>> >> device 69 device69
> >>>> >> device 70 device70
> >>>> >> device 71 device71
> >>>> >> device 72 osd.72
> >>>> >> device 73 osd.73
> >>>> >> device 74 osd.74
> >>>> >> device 75 osd.75
> >>>> >> device 76 osd.76
> >>>> >> device 77 osd.77
> >>>> >> device 78 osd.78
> >>>> >> device 79 osd.79
> >>>> >> device 80 osd.80
> >>>> >> device 81 osd.81
> >>>> >> device 82 osd.82
> >>>> >> device 83 osd.83
> >>>> >>
> >>>> >> # types
> >>>> >> type 0 osd
> >>>> >> type 1 host
> >>>> >> type 2 chassis
> >>>> >> type 3 rack
> >>>> >> type 4 row
> >>>> >> type 5 pdu
> >>>> >> type 6 pod
> >>>> >> type 7 room
> >>>> >> type 8 datacenter
> >>>> >> type 9 region
> >>>> >> type 10 root
> >>>> >>
> >>>> >> # buckets
> >>>> >> host slpeah007 {
> >>>> >>         id -9           # do not change unnecessarily
> >>>> >>         # weight 32.760
> >>>> >>         alg straw
> >>>> >>         hash 0  # rjenkins1
> >>>> >>         item osd.72 weight 5.460
> >>>> >>         item osd.73 weight 5.460
> >>>> >>         item osd.74 weight 5.460
> >>>> >>         item osd.75 weight 5.460
> >>>> >>         item osd.76 weight 5.460
> >>>> >>         item osd.77 weight 5.460
> >>>> >> }
> >>>> >> host slpeah008 {
> >>>> >>         id -10          # do not change unnecessarily
> >>>> >>         # weight 32.760
> >>>> >>         alg straw
> >>>> >>         hash 0  # rjenkins1
> >>>> >>         item osd.78 weight 5.460
> >>>> >>         item osd.79 weight 5.460
> >>>> >>         item osd.80 weight 5.460
> >>>> >>         item osd.81 weight 5.460
> >>>> >>         item osd.82 weight 5.460
> >>>> >>         item osd.83 weight 5.460
> >>>> >> }
> >>>> >> host slpeah001 {
> >>>> >>         id -3           # do not change unnecessarily
> >>>> >>         # weight 14.560
> >>>> >>         alg straw
> >>>> >>         hash 0  # rjenkins1
> >>>> >>         item osd.1 weight 3.640
> >>>> >>         item osd.33 weight 3.640
> >>>> >>         item osd.34 weight 3.640
> >>>> >>         item osd.35 weight 3.640
> >>>> >> }
> >>>> >> host slpeah002 {
> >>>> >>         id -2           # do not change unnecessarily
> >>>> >>         # weight 14.560
> >>>> >>         alg straw
> >>>> >>         hash 0  # rjenkins1
> >>>> >>         item osd.0 weight 3.640
> >>>> >>         item osd.36 weight 3.640
> >>>> >>         item osd.37 weight 3.640
> >>>> >>         item osd.38 weight 3.640
> >>>> >> }
> >>>> >> root default {
> >>>> >>         id -1           # do not change unnecessarily
> >>>> >>         # weight 94.640
> >>>> >>         alg straw
> >>>> >>         hash 0  # rjenkins1
> >>>> >>         item slpeah007 weight 32.760
> >>>> >>         item slpeah008 weight 32.760
> >>>> >>         item slpeah001 weight 14.560
> >>>> >>         item slpeah002 weight 14.560
> >>>> >> }
> >>>> >>
> >>>> >> # rules
> >>>> >> rule default {
> >>>> >>         ruleset 0
> >>>> >>         type replicated
> >>>> >>         min_size 1
> >>>> >>         max_size 10
> >>>> >>         step take default
> >>>> >>         step chooseleaf firstn 0 type host
> >>>> >>         step emit
> >>>> >> }
> >>>> >>
> >>>> >> # end crush map
> >>>> >>
> >>>> >>
> >>>> >>
> >>>> >> This is odd because pools have size 3 and I have 3 hosts alive, so
> why
> >>>> >> it is saying that undersized pgs are present? It makes me feel like
> >>>> >> CRUSH is not working properly.
> >>>> >> There is not much data currently in cluster, something about 3TB
> and
> >>>> >> as you can see from osd tree - each host have minimum of 14TB disk
> >>>> >> space on OSDs.
> >>>> >> So I'm a bit stuck now...
> >>>> >> How can I find the source of trouble?
> >>>> >>
> >>>> >> Thanks in advance!
> >>>> >> _______________________________________________
> >>>> >> ceph-users mailing list
> >>>> >> ceph-users@lists.ceph.com
> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> >>
> >>>> >> _______________________________________________
> >>>> >> ceph-users mailing list
> >>>> >> ceph-users@lists.ceph.com
> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> >>
> >>>> >
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>>
> >>> --
> >>> Mart van Santen
> >>> Greenhost
> >>> E: m...@greenhost.nl
> >>> T: +31 20 4890444
> >>> W: https://greenhost.nl
> >>>
> >>> A PGP signature can be attached to this e-mail,
> >>> you need PGP software to verify it.
> >>> My public key is available in keyserver(s)
> >>> see: http://tinyurl.com/openpgp-manual
> >>>
> >>> PGP Fingerprint: CA85 EB11 2B70 042D AF66  B29A 6437 01A1 10A3 D3A5
> >>>
> >>>
> >>> _______________________________________________
> >>> ceph-users mailing list
> >>> ceph-users@lists.ceph.com
> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>
> >>
> >>
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@lists.ceph.com
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
  • [ceph-users... Vasiliy Angapov
    • Re: [c... ЦИТ РТ-Курамшин Камиль Фидаилевич
      • Re... Irek Fasikhov
        • ... Vasiliy Angapov
          • ... Irek Fasikhov
            • ... Mart van Santen
              • ... Bob R
                • ... Vasiliy Angapov
                • ... Vasiliy Angapov
                • ... Bob R

Reply via email to