[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Hi,

The ceph.log from when you upgraded should give some clues.
Are you using upmap balancing? Maybe this is just just further
refinement of the balancing.

-- dan

On Thu, Mar 5, 2020 at 8:58 AM Rainer Krienke  wrote:
>
> Hello,
>
> at the moment my ceph is still working but in a degraded state after I
> upgraded one (in 9) hosts from 14.2.7 to 14.2.8 and rebooting this host
> (node2, one  monitor in 3) after the upgrade.
>
> Usually before rebooting I set
>
>ceph osd set noout
>ceph osd set nobackfill
>ceph osd set norecover
>
> before rebooting, but I fogot this time. After having realized my error
> I thought, ok I forgot to set the flags but I configured
> mon_osd_down_out_interval to 900sec:
>
> # ceph config get mon.mon_osd_down_out_interval
> WHOMASK LEVELOPTIONVALUE RO
> mon advanced mon_osd_down_out_interval 900
>
> The reboot took 5min so I expected nothing to happen. But it did and now
> I do not understand why and if there are more timeout values I
> could/should set to avoid this happening again if I ever should again
> forget to set the noout , nobackfill, norecover flags prior to a reboot?
>
>
> Thanks if anyone can explain to me what might have happened
> Rainer
>
>
>
> The current ceph state is:
> # ceph -s
>   cluster:
> id: xyz
> health: HEALTH_WARN
> Degraded data redundancy: 191629/76527549 objects degraded
> (0.250%), 18 pgs degraded, 18 pgs undersized
>
>   services:
> mon: 3 daemons, quorum node2,node5,node8 (age 51m)
> mgr: node5(active, since 53m), standbys: node8, node-admin, node2
> mds: mycephfs:1 {0=node3=up:active} 2 up:standby
> osd: 144 osds: 144 up (since 51m), 144 in (since 3M); 48 remapped pgs
>
>   data:
> pools:   13 pools, 3460 pgs
> objects: 12.76M objects, 48 TiB
> usage:   95 TiB used, 429 TiB / 524 TiB avail
> pgs: 191629/76527549 objects degraded (0.250%)
>  3098164/76527549 objects misplaced (4.048%)
>  3412 active+clean
>  30   active+remapped+backfill_wait
>  13   active+undersized+degraded+remapped+backfill_wait
>  5active+undersized+degraded+remapped+backfilling
>
>   io:
> client:   33 MiB/s rd, 7.2 MiB/s wr, 91 op/s rd, 186 op/s wr
> recovery: 83 MiB/s, 20 objects/s
> --
> Rainer Krienke, Uni Koblenz, Rechenzentrum, A22, Universitaetsstrasse 1
> 56070 Koblenz, Tel: +49261287 1312 Fax +49261287 100 1312
> Web: http://userpages.uni-koblenz.de/~krienke
> PGP: http://userpages.uni-koblenz.de/~krienke/mypgp.html
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Identify slow ops

2020-03-05 Thread Thomas Schneider
Hi,

I have stopped all 3 MON services sequentially.
After starting the 3 MON services again, the slow ops where gone.
However, just after 1 min. of MON service uptime, the slow ops are back
again, and the blocked time is increasing constantly.

root@ld3955:/home/ceph-scripts
# ceph -w
  cluster:
    id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
    health: HEALTH_WARN
    17 nearfull osd(s)
    1 pool(s) nearfull
    2 slow ops, oldest one blocked for 63 sec, mon.ld5505 has
slow ops

  services:
    mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 67s)
    mgr: ld5505(active, since 11d), standbys: ld5506, ld5507
    mds: cephfs:2 {0=ld5507=up:active,1=ld5505=up:active} 2
up:standby-replay 3 up:standby
    osd: 442 osds: 442 up (since 4w), 442 in (since 4w); 10 remapped pgs

  data:
    pools:   7 pools, 19628 pgs
    objects: 72.14M objects, 275 TiB
    usage:   826 TiB used, 705 TiB / 1.5 PiB avail
    pgs: 16920/216422157 objects misplaced (0.008%)
 19618 active+clean
 10    active+remapped+backfilling

  io:
    client:   454 KiB/s rd, 15 MiB/s wr, 905 op/s rd, 463 op/s wr
    recovery: 125 MiB/s, 31 objects/s


2020-03-05 09:21:48.647440 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 63 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:21:53.648708 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 68 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:21:58.650186 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 73 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:03.651447 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 78 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:08.653066 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 83 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:13.654699 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 88 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:18.655912 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 93 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:23.657263 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 98 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:28.658514 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 103 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:33.659965 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 108 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:38.661360 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 113 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:43.662727 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 118 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:48.663940 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 123 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:53.685451 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 128 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:22:58.691603 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 133 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:03.692841 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 138 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:08.694502 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 143 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:13.695991 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 148 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:18.697689 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 153 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:23.698945 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 158 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:28.700331 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 163 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:33.701754 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 168 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:38.703021 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 173 sec, mon.ld5505 has slow ops (SLOW_OPS)
2020-03-05 09:23:43.704396 mon.ld5505 [WRN] Health check update: 2 slow
ops, oldest one blocked for 178 sec, mon.ld5505 has slow ops (SLOW_OPS)

I have the impression that this is not a harmless bug anymore.

Please advise how to proceed.

THX


Am 17.02.2020 um 18:31 schrieb Paul Emmerich:
> that's probably just https://tracker.ceph.com/issues/43893
> (a harmless bug)
>
> Restart the mons to get rid of the message
>
> Paul
>
> -- Paul Emmerich Looking for help with your Ceph cluster? Contact us
> at http

[ceph-users] Re: consistency of import-diff

2020-03-05 Thread Janne Johansson
Den tors 5 mars 2020 kl 08:13 skrev Stefan Priebe - Profihost AG <
s.pri...@profihost.ag>:

> >> Hrm. We have checksums on the actual OSD data, so it ought to be
> >> possible to add these to the export/import/diff bits so it can be
> >> verified faster.
> >> (Well, barring bugs.)
> >>
> > I mainly meant bugs, I should have clarified that better.
> >
> > Do you trust the technology you want to backup to create the proper
> > backup for you? With that I mean, what if librbd or librados contains a
> > bug which corrupts all your backups?
> >
> > You think the backups all went fine because the snapshots seem
> > consistent on both ends, but you are not sure until you actually test a
> > restore.
>
> Yes and know. If the object checksums inside ceph are equal it must be a
> really bad bug. Sure this can happen but i think the chances are very low
>

Are we talking about checksums from months ago when it was created or the
"current" data, which may or may not have been changed or not fully copied
over
to the destination in this case?

It smells a bit like shuffling around the vocabulary in order to try not to
end up in
"dang, in order to actually know, some part must actually read 100TB data"
which I think is where you have to go in order for you to move from
"I think its ok" to "I know it is ok" after a copy/move/rebuild from A to B.

I'm not trying to be obtuse or anything, just noting that when you sync
something
from A to B and ceph claims it is there, it has done as good as it can to
check that
the operation was done 100%.

There might be reasons for distrusting this or not, but
when you DO decide to mistrust, it feels weird to move backwards again and
go "well,
tcp checksums would have caught transmission errors, filesystem/OSD
checksums
should have caught storage errors and ..." because the idea if I understood
it correctly
to begin with was how to remove doubt about that all the various levels of
operations
have actually managed to create a perfect copy or not, not just list things
that helped
ceph make it probably-ok-but-I-dont-know to begin with.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
I found some information in ceph.log that might help to find out what
happened. node2  was the one I rebooted:

2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
scrub starts
2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
scrub ok
2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
osdmap e31855: 144 total, 144 up, 144 in
2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
osdmap e31856: 144 total, 144 up, 144 in
2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
Health check failed: 1 pools have many more objects per pg than average
(MANY_OBJECTS_PER_PG)
2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
osdmap e31857: 144 total, 144 up, 144 in
2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
osdmap e31858: 144 total, 144 up, 144 in
2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.631869 osd.127 (osd.127) 693 : cluster [DBG]
36.3eds0 starting backfill to osd.65(2) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.644089 osd.127 (osd.127) 694 : cluster [DBG]
36.3eds0 starting backfill to osd.97(3) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.656223 osd.127 (osd.127) 695 : cluster [DBG]
36.3eds0 starting backfill to osd.122(0) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.669265 osd.127 (osd.127) 696 : cluster [DBG]
36.3eds0 starting backfill to osd.134(1) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.582485 osd.69 (osd.69) 549 : cluster [DBG] 36.3fes0
starting backfill to osd.13(1) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.590541 osd.5 (osd.5) 349 : cluster [DBG] 36.3f2s0
starting backfill to osd.10(0) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.596496 osd.69 (osd.69) 550 : cluster [DBG] 36.3fes0
starting backfill to osd.25(5) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.601781 osd.86 (osd.86) 457 : cluster [DBG] 36.3ees0
starting backfill to osd.10(4) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.603864 osd.69 (osd.69) 551 : cluster [DBG] 36.3fes0
starting backfill to osd.58(2) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.610409 osd.69 (osd.69) 552 : cluster [DBG] 36.3fes0
starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.614494 osd.5 (osd.5) 350 : cluster [DBG] 36.3f2s0
starting backfill to osd.41(1) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.617208 osd.69 (osd.69) 553 : cluster [DBG] 36.3fes0
starting backfill to osd.99(0) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.622645 osd.86 (osd.86) 458 : cluster [DBG] 36.3ees0
starting backfill to osd.48(5) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.624049 osd.69 (osd.69) 554 : cluster [DBG] 36.3fes0
starting backfill to osd.121(4) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.625556 osd.5 (osd.5) 351 : cluster [DBG] 36.3f2s0
starting backfill to osd.61(3) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.631348 osd.86 (osd.86) 459 : cluster [DBG] 36.3ees0
starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.634572 osd.5 (osd.5) 352 : cluster [DBG] 36.3f2s0
starting backfill to osd.71(4) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.641651 osd.86 (osd.86) 460 : cluster [DBG] 36.3ees0
starting backfill to osd.90(0) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.644983 osd.5 (osd.5) 353 : cluster [DBG] 36.3f2s0
starting backfill to osd.122(5) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.649661 osd.86 (osd.86) 461 : cluster [DBG] 36.3ees0
starting backfill to osd.118(2) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.652407 osd.5 (osd.5) 354 : cluster [DBG] 36.3f2s0
starting backfill to osd.131(2) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.659823 osd.86 (osd.86) 462 : cluster [DBG] 36.3ees0
starting backfill to osd.139(1) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:42.055680 mon.node2 (mon.0) 692729 : cluster [INF]
osd.23 marked itself down
2020-03-05 07:24:42.055765 mon.node2 (mon.0) 692730 : cluster [INF]
osd.18 marked itself down
2020-03-05 07:24:42.055919 mon.node2 (mon.0) 692731 : cluster [INF]
osd.21 marked itself down
2020-03-05 07:24:42.056002 mon.node2 (mon.0) 692732 : cluster [INF]
osd.24 marked itself down
2020-03-05 07:24:42.056250 mon.node2 (mon.0) 692733 : cluster [INF]
osd.17 marked itself down
2020-03-05 07:24:42.058049 mon.node2 (mon.0) 692734 : cluster [INF]
osd.16 marked itself down
2020-03-05 07:24:42.064002 mon.node2 (mon.0) 692735 : cluster [INF]
osd.31 marked itself down
2020-03-05 07:24:42.069635 mon.node2 (mon.0) 692736 : cluster [INF]
osd.26 marked itself down
2020-03-05 07:24:42.075325 mon.node2 (mon.0) 692737 : cl

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Did you have `144 total, 144 up, 144 in` also before the upgrade?
If an osd was out, then you upgraded/restarted and it went back in, it
would trigger data movement.
(I usually set noin before an upgrade).

-- dan

On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke  wrote:
>
> I found some information in ceph.log that might help to find out what
> happened. node2  was the one I rebooted:
>
> 2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
> scrub starts
> 2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
> scrub ok
> 2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
> osdmap e31855: 144 total, 144 up, 144 in
> 2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
> osdmap e31856: 144 total, 144 up, 144 in
> 2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
> Health check failed: 1 pools have many more objects per pg than average
> (MANY_OBJECTS_PER_PG)
> 2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
> osdmap e31857: 144 total, 144 up, 144 in
> 2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
> osdmap e31858: 144 total, 144 up, 144 in
> 2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
> 36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 31854'297918
> 2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
> 36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 31854'297918
> 2020-03-05 07:24:41.631869 osd.127 (osd.127) 693 : cluster [DBG]
> 36.3eds0 starting backfill to osd.65(2) from (0'0,0'0] MAX to 31854'297918
> 2020-03-05 07:24:41.644089 osd.127 (osd.127) 694 : cluster [DBG]
> 36.3eds0 starting backfill to osd.97(3) from (0'0,0'0] MAX to 31854'297918
> 2020-03-05 07:24:41.656223 osd.127 (osd.127) 695 : cluster [DBG]
> 36.3eds0 starting backfill to osd.122(0) from (0'0,0'0] MAX to 31854'297918
> 2020-03-05 07:24:41.669265 osd.127 (osd.127) 696 : cluster [DBG]
> 36.3eds0 starting backfill to osd.134(1) from (0'0,0'0] MAX to 31854'297918
> 2020-03-05 07:24:41.582485 osd.69 (osd.69) 549 : cluster [DBG] 36.3fes0
> starting backfill to osd.13(1) from (0'0,0'0] MAX to 31854'280018
> 2020-03-05 07:24:41.590541 osd.5 (osd.5) 349 : cluster [DBG] 36.3f2s0
> starting backfill to osd.10(0) from (0'0,0'0] MAX to 31854'331157
> 2020-03-05 07:24:41.596496 osd.69 (osd.69) 550 : cluster [DBG] 36.3fes0
> starting backfill to osd.25(5) from (0'0,0'0] MAX to 31854'280018
> 2020-03-05 07:24:41.601781 osd.86 (osd.86) 457 : cluster [DBG] 36.3ees0
> starting backfill to osd.10(4) from (0'0,0'0] MAX to 31854'511090
> 2020-03-05 07:24:41.603864 osd.69 (osd.69) 551 : cluster [DBG] 36.3fes0
> starting backfill to osd.58(2) from (0'0,0'0] MAX to 31854'280018
> 2020-03-05 07:24:41.610409 osd.69 (osd.69) 552 : cluster [DBG] 36.3fes0
> starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'280018
> 2020-03-05 07:24:41.614494 osd.5 (osd.5) 350 : cluster [DBG] 36.3f2s0
> starting backfill to osd.41(1) from (0'0,0'0] MAX to 31854'331157
> 2020-03-05 07:24:41.617208 osd.69 (osd.69) 553 : cluster [DBG] 36.3fes0
> starting backfill to osd.99(0) from (0'0,0'0] MAX to 31854'280018
> 2020-03-05 07:24:41.622645 osd.86 (osd.86) 458 : cluster [DBG] 36.3ees0
> starting backfill to osd.48(5) from (0'0,0'0] MAX to 31854'511090
> 2020-03-05 07:24:41.624049 osd.69 (osd.69) 554 : cluster [DBG] 36.3fes0
> starting backfill to osd.121(4) from (0'0,0'0] MAX to 31854'280018
> 2020-03-05 07:24:41.625556 osd.5 (osd.5) 351 : cluster [DBG] 36.3f2s0
> starting backfill to osd.61(3) from (0'0,0'0] MAX to 31854'331157
> 2020-03-05 07:24:41.631348 osd.86 (osd.86) 459 : cluster [DBG] 36.3ees0
> starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'511090
> 2020-03-05 07:24:41.634572 osd.5 (osd.5) 352 : cluster [DBG] 36.3f2s0
> starting backfill to osd.71(4) from (0'0,0'0] MAX to 31854'331157
> 2020-03-05 07:24:41.641651 osd.86 (osd.86) 460 : cluster [DBG] 36.3ees0
> starting backfill to osd.90(0) from (0'0,0'0] MAX to 31854'511090
> 2020-03-05 07:24:41.644983 osd.5 (osd.5) 353 : cluster [DBG] 36.3f2s0
> starting backfill to osd.122(5) from (0'0,0'0] MAX to 31854'331157
> 2020-03-05 07:24:41.649661 osd.86 (osd.86) 461 : cluster [DBG] 36.3ees0
> starting backfill to osd.118(2) from (0'0,0'0] MAX to 31854'511090
> 2020-03-05 07:24:41.652407 osd.5 (osd.5) 354 : cluster [DBG] 36.3f2s0
> starting backfill to osd.131(2) from (0'0,0'0] MAX to 31854'331157
> 2020-03-05 07:24:41.659823 osd.86 (osd.86) 462 : cluster [DBG] 36.3ees0
> starting backfill to osd.139(1) from (0'0,0'0] MAX to 31854'511090
> 2020-03-05 07:24:42.055680 mon.node2 (mon.0) 692729 : cluster [INF]
> osd.23 marked itself down
> 2020-03-05 07:24:42.055765 mon.node2 (mon.0) 692730 : cluster [INF]
> osd.18 marked itself down
> 2020-03-05 07:24:42.055919 mon.node2 (mon.0) 692731 : cluster [INF]
> osd.21 marked itself down
> 2020-03-05 07:24:42.056002 mon.node2 (mon.0) 692732 : cluster [INF]
> osd.24 marked itself down
> 2020-03-05 0

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Janek Bevendorff
I also had some inadvertent recovery going on, although I think it 
started after I had restarted all MON, MGR, and MDS nodes and before I 
started restarting OSDs.



On 05/03/2020 09:49, Dan van der Ster wrote:

Did you have `144 total, 144 up, 144 in` also before the upgrade?
If an osd was out, then you upgraded/restarted and it went back in, it
would trigger data movement.
(I usually set noin before an upgrade).

-- dan

On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke  wrote:

I found some information in ceph.log that might help to find out what
happened. node2  was the one I rebooted:

2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
scrub starts
2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
scrub ok
2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
osdmap e31855: 144 total, 144 up, 144 in
2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
osdmap e31856: 144 total, 144 up, 144 in
2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
Health check failed: 1 pools have many more objects per pg than average
(MANY_OBJECTS_PER_PG)
2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
osdmap e31857: 144 total, 144 up, 144 in
2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
osdmap e31858: 144 total, 144 up, 144 in
2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.631869 osd.127 (osd.127) 693 : cluster [DBG]
36.3eds0 starting backfill to osd.65(2) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.644089 osd.127 (osd.127) 694 : cluster [DBG]
36.3eds0 starting backfill to osd.97(3) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.656223 osd.127 (osd.127) 695 : cluster [DBG]
36.3eds0 starting backfill to osd.122(0) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.669265 osd.127 (osd.127) 696 : cluster [DBG]
36.3eds0 starting backfill to osd.134(1) from (0'0,0'0] MAX to 31854'297918
2020-03-05 07:24:41.582485 osd.69 (osd.69) 549 : cluster [DBG] 36.3fes0
starting backfill to osd.13(1) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.590541 osd.5 (osd.5) 349 : cluster [DBG] 36.3f2s0
starting backfill to osd.10(0) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.596496 osd.69 (osd.69) 550 : cluster [DBG] 36.3fes0
starting backfill to osd.25(5) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.601781 osd.86 (osd.86) 457 : cluster [DBG] 36.3ees0
starting backfill to osd.10(4) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.603864 osd.69 (osd.69) 551 : cluster [DBG] 36.3fes0
starting backfill to osd.58(2) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.610409 osd.69 (osd.69) 552 : cluster [DBG] 36.3fes0
starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.614494 osd.5 (osd.5) 350 : cluster [DBG] 36.3f2s0
starting backfill to osd.41(1) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.617208 osd.69 (osd.69) 553 : cluster [DBG] 36.3fes0
starting backfill to osd.99(0) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.622645 osd.86 (osd.86) 458 : cluster [DBG] 36.3ees0
starting backfill to osd.48(5) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.624049 osd.69 (osd.69) 554 : cluster [DBG] 36.3fes0
starting backfill to osd.121(4) from (0'0,0'0] MAX to 31854'280018
2020-03-05 07:24:41.625556 osd.5 (osd.5) 351 : cluster [DBG] 36.3f2s0
starting backfill to osd.61(3) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.631348 osd.86 (osd.86) 459 : cluster [DBG] 36.3ees0
starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.634572 osd.5 (osd.5) 352 : cluster [DBG] 36.3f2s0
starting backfill to osd.71(4) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.641651 osd.86 (osd.86) 460 : cluster [DBG] 36.3ees0
starting backfill to osd.90(0) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.644983 osd.5 (osd.5) 353 : cluster [DBG] 36.3f2s0
starting backfill to osd.122(5) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.649661 osd.86 (osd.86) 461 : cluster [DBG] 36.3ees0
starting backfill to osd.118(2) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:41.652407 osd.5 (osd.5) 354 : cluster [DBG] 36.3f2s0
starting backfill to osd.131(2) from (0'0,0'0] MAX to 31854'331157
2020-03-05 07:24:41.659823 osd.86 (osd.86) 462 : cluster [DBG] 36.3ees0
starting backfill to osd.139(1) from (0'0,0'0] MAX to 31854'511090
2020-03-05 07:24:42.055680 mon.node2 (mon.0) 692729 : cluster [INF]
osd.23 marked itself down
2020-03-05 07:24:42.055765 mon.node2 (mon.0) 692730 : cluster [INF]
osd.18 marked itself down
2020-03-05 07:24:42.055919 mon.node2 (mon.0) 692731 : cluster [INF]
osd.21 marked itself down
2020-03-05 07:24:42.056002 mon.node2 (mon.

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
Hello,

before I ran the update to 14.2.8 I checked that the state was healthy
with all OSDs up and in. I still have the command history I typed
visible in my kde terminal buffer and there I see that after the update
but before the reboot I ran a ceph -s and there were 144 osd's up and in
the state was HEALTH_OK.

Could it be of interest that the node rebooted was a monitor node and
should mon_osd_down_out_interval at least in theory have prevented what
happened to my cluster?

Thanks
Rainer

Am 05.03.20 um 09:49 schrieb Dan van der Ster:
> Did you have `144 total, 144 up, 144 in` also before the upgrade?
> If an osd was out, then you upgraded/restarted and it went back in, it
> would trigger data movement.
> (I usually set noin before an upgrade).
> 
> -- dan
> 
> On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke  wrote:
>>
>> I found some information in ceph.log that might help to find out what
>> happened. node2  was the one I rebooted:
>>
>> 2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
>> scrub starts
>> 2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
>> scrub ok
>> 2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
>> osdmap e31855: 144 total, 144 up, 144 in
>> 2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
>> osdmap e31856: 144 total, 144 up, 144 in
>> 2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
>> Health check failed: 1 pools have many more objects per pg than average
>> (MANY_OBJECTS_PER_PG)
>> 2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
>> osdmap e31857: 144 total, 144 up, 144 in
>> 2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
>> osdmap e31858: 144 total, 144 up, 144 in
>> 2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
>> 36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 31854'297918
>> 2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
>> 36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 31854'297918
>> 2020-03-05 07:24:41.631869 osd.127 (osd.127) 693 : cluster [DBG]
>> 36.3eds0 starting backfill to osd.65(2) from (0'0,0'0] MAX to 31854'297918
>> 2020-03-05 07:24:41.644089 osd.127 (osd.127) 694 : cluster [DBG]
>> 36.3eds0 starting backfill to osd.97(3) from (0'0,0'0] MAX to 31854'297918
>> 2020-03-05 07:24:41.656223 osd.127 (osd.127) 695 : cluster [DBG]
>> 36.3eds0 starting backfill to osd.122(0) from (0'0,0'0] MAX to 31854'297918
>> 2020-03-05 07:24:41.669265 osd.127 (osd.127) 696 : cluster [DBG]
>> 36.3eds0 starting backfill to osd.134(1) from (0'0,0'0] MAX to 31854'297918
>> 2020-03-05 07:24:41.582485 osd.69 (osd.69) 549 : cluster [DBG] 36.3fes0
>> starting backfill to osd.13(1) from (0'0,0'0] MAX to 31854'280018
>> 2020-03-05 07:24:41.590541 osd.5 (osd.5) 349 : cluster [DBG] 36.3f2s0
>> starting backfill to osd.10(0) from (0'0,0'0] MAX to 31854'331157
>> 2020-03-05 07:24:41.596496 osd.69 (osd.69) 550 : cluster [DBG] 36.3fes0
>> starting backfill to osd.25(5) from (0'0,0'0] MAX to 31854'280018
>> 2020-03-05 07:24:41.601781 osd.86 (osd.86) 457 : cluster [DBG] 36.3ees0
>> starting backfill to osd.10(4) from (0'0,0'0] MAX to 31854'511090
>> 2020-03-05 07:24:41.603864 osd.69 (osd.69) 551 : cluster [DBG] 36.3fes0
>> starting backfill to osd.58(2) from (0'0,0'0] MAX to 31854'280018
>> 2020-03-05 07:24:41.610409 osd.69 (osd.69) 552 : cluster [DBG] 36.3fes0
>> starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'280018
>> 2020-03-05 07:24:41.614494 osd.5 (osd.5) 350 : cluster [DBG] 36.3f2s0
>> starting backfill to osd.41(1) from (0'0,0'0] MAX to 31854'331157
>> 2020-03-05 07:24:41.617208 osd.69 (osd.69) 553 : cluster [DBG] 36.3fes0
>> starting backfill to osd.99(0) from (0'0,0'0] MAX to 31854'280018
>> 2020-03-05 07:24:41.622645 osd.86 (osd.86) 458 : cluster [DBG] 36.3ees0
>> starting backfill to osd.48(5) from (0'0,0'0] MAX to 31854'511090
>> 2020-03-05 07:24:41.624049 osd.69 (osd.69) 554 : cluster [DBG] 36.3fes0
>> starting backfill to osd.121(4) from (0'0,0'0] MAX to 31854'280018
>> 2020-03-05 07:24:41.625556 osd.5 (osd.5) 351 : cluster [DBG] 36.3f2s0
>> starting backfill to osd.61(3) from (0'0,0'0] MAX to 31854'331157
>> 2020-03-05 07:24:41.631348 osd.86 (osd.86) 459 : cluster [DBG] 36.3ees0
>> starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'511090
>> 2020-03-05 07:24:41.634572 osd.5 (osd.5) 352 : cluster [DBG] 36.3f2s0
>> starting backfill to osd.71(4) from (0'0,0'0] MAX to 31854'331157
>> 2020-03-05 07:24:41.641651 osd.86 (osd.86) 460 : cluster [DBG] 36.3ees0
>> starting backfill to osd.90(0) from (0'0,0'0] MAX to 31854'511090
>> 2020-03-05 07:24:41.644983 osd.5 (osd.5) 353 : cluster [DBG] 36.3f2s0
>> starting backfill to osd.122(5) from (0'0,0'0] MAX to 31854'331157
>> 2020-03-05 07:24:41.649661 osd.86 (osd.86) 461 : cluster [DBG] 36.3ees0
>> starting backfill to osd.118(2) from (0'0,0'0] MAX to 31854'511090
>> 2020-03-05 07:24:41.652407 osd.5 (osd.5) 354 : cluster [DBG] 36

[ceph-users] PGs unknown after pool creation (Nautilus 14.2.4/6)

2020-03-05 Thread dg

Hello,


I have a small ceph cluster running with 3 MON/MGR and 3 OSD hosts. 
There are also 3 virtual hosts in the crushmap to have a seperate SSD 
pool. Currently two pools are running, one of that exclusive to the SSD 
device class.


My problem now is, that any new pool I try to create won't become 
functional as all new pgs are in unknown state. I've tried varying pg 
number, crush ruleset, size and so on, but nothing helped.


my OSDs regularly show an error message like

"var/log/ceph/ceph-osd.42.log:2020-03-04 17:09:04.641 7f76240c9700  0 
--1- [v2:192.168.44.110:6834/23888,v1:192.168.44.110:6835/23888] >> 
v1:192.168.44.111:6826/484449 conn(0x55901c805800 0x55901d5bf000 :-1 
s=CONNECTING_SEND_CONNECT_MSG pgs=398 cs=186 l=0).handle_connect_reply_2 
connect got BADAUTHORIZER"


,but I don't find any reasons for that (clocks are synchronized).

Also one of my mons is two minor versions newer then the other nodes, 
but would not really like to update the whole cluster right now as I've 
had somewhat bad experience with the last update :)



Does anyone have any idea what I could try ?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Hi,

There was movement already before you rebooted the node at 07:24:41.598004.
That tells me that it was a ceph-mon process that restarted and either
trimmed some upmaps or something similar.

You can do this to see exactly what changed:

# ceph osd getmap -o 31853 31853   # this is a guess -- pick an osdmap
epoch that was just before you upgraded.
# ceph osd getmap -o 31856 31856
# diff <(osdmaptool --print 31853) <(osdmaptool --print 31856)

-- dan



On Thu, Mar 5, 2020 at 10:05 AM Rainer Krienke  wrote:
>
> Hello,
>
> before I ran the update to 14.2.8 I checked that the state was healthy
> with all OSDs up and in. I still have the command history I typed
> visible in my kde terminal buffer and there I see that after the update
> but before the reboot I ran a ceph -s and there were 144 osd's up and in
> the state was HEALTH_OK.
>
> Could it be of interest that the node rebooted was a monitor node and
> should mon_osd_down_out_interval at least in theory have prevented what
> happened to my cluster?
>
> Thanks
> Rainer
>
> Am 05.03.20 um 09:49 schrieb Dan van der Ster:
> > Did you have `144 total, 144 up, 144 in` also before the upgrade?
> > If an osd was out, then you upgraded/restarted and it went back in, it
> > would trigger data movement.
> > (I usually set noin before an upgrade).
> >
> > -- dan
> >
> > On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke  
> > wrote:
> >>
> >> I found some information in ceph.log that might help to find out what
> >> happened. node2  was the one I rebooted:
> >>
> >> 2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
> >> scrub starts
> >> 2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
> >> scrub ok
> >> 2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
> >> osdmap e31855: 144 total, 144 up, 144 in
> >> 2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
> >> osdmap e31856: 144 total, 144 up, 144 in
> >> 2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
> >> Health check failed: 1 pools have many more objects per pg than average
> >> (MANY_OBJECTS_PER_PG)
> >> 2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
> >> osdmap e31857: 144 total, 144 up, 144 in
> >> 2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
> >> osdmap e31858: 144 total, 144 up, 144 in
> >> 2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
> >> 36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 31854'297918
> >> 2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
> >> 36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 31854'297918
> >> 2020-03-05 07:24:41.631869 osd.127 (osd.127) 693 : cluster [DBG]
> >> 36.3eds0 starting backfill to osd.65(2) from (0'0,0'0] MAX to 31854'297918
> >> 2020-03-05 07:24:41.644089 osd.127 (osd.127) 694 : cluster [DBG]
> >> 36.3eds0 starting backfill to osd.97(3) from (0'0,0'0] MAX to 31854'297918
> >> 2020-03-05 07:24:41.656223 osd.127 (osd.127) 695 : cluster [DBG]
> >> 36.3eds0 starting backfill to osd.122(0) from (0'0,0'0] MAX to 31854'297918
> >> 2020-03-05 07:24:41.669265 osd.127 (osd.127) 696 : cluster [DBG]
> >> 36.3eds0 starting backfill to osd.134(1) from (0'0,0'0] MAX to 31854'297918
> >> 2020-03-05 07:24:41.582485 osd.69 (osd.69) 549 : cluster [DBG] 36.3fes0
> >> starting backfill to osd.13(1) from (0'0,0'0] MAX to 31854'280018
> >> 2020-03-05 07:24:41.590541 osd.5 (osd.5) 349 : cluster [DBG] 36.3f2s0
> >> starting backfill to osd.10(0) from (0'0,0'0] MAX to 31854'331157
> >> 2020-03-05 07:24:41.596496 osd.69 (osd.69) 550 : cluster [DBG] 36.3fes0
> >> starting backfill to osd.25(5) from (0'0,0'0] MAX to 31854'280018
> >> 2020-03-05 07:24:41.601781 osd.86 (osd.86) 457 : cluster [DBG] 36.3ees0
> >> starting backfill to osd.10(4) from (0'0,0'0] MAX to 31854'511090
> >> 2020-03-05 07:24:41.603864 osd.69 (osd.69) 551 : cluster [DBG] 36.3fes0
> >> starting backfill to osd.58(2) from (0'0,0'0] MAX to 31854'280018
> >> 2020-03-05 07:24:41.610409 osd.69 (osd.69) 552 : cluster [DBG] 36.3fes0
> >> starting backfill to osd.78(3) from (0'0,0'0] MAX to 31854'280018
> >> 2020-03-05 07:24:41.614494 osd.5 (osd.5) 350 : cluster [DBG] 36.3f2s0
> >> starting backfill to osd.41(1) from (0'0,0'0] MAX to 31854'331157
> >> 2020-03-05 07:24:41.617208 osd.69 (osd.69) 553 : cluster [DBG] 36.3fes0
> >> starting backfill to osd.99(0) from (0'0,0'0] MAX to 31854'280018
> >> 2020-03-05 07:24:41.622645 osd.86 (osd.86) 458 : cluster [DBG] 36.3ees0
> >> starting backfill to osd.48(5) from (0'0,0'0] MAX to 31854'511090
> >> 2020-03-05 07:24:41.624049 osd.69 (osd.69) 554 : cluster [DBG] 36.3fes0
> >> starting backfill to osd.121(4) from (0'0,0'0] MAX to 31854'280018
> >> 2020-03-05 07:24:41.625556 osd.5 (osd.5) 351 : cluster [DBG] 36.3f2s0
> >> starting backfill to osd.61(3) from (0'0,0'0] MAX to 31854'331157
> >> 2020-03-05 07:24:41.631348 osd.86 (osd.86) 459 : cluster [DBG] 36.3ees0
> >> starting backfill to osd.78(3) f

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
The difference was not a big one and consists in a change in pgp_num for
a pool named pxa-ec froom 1024 to 999. All OSDs were up in the last map
(31856) :

# diff 31853.txt 31856.txt
1c1
< epoch 31853
---
> epoch 31856
4c4
< modified 2020-03-04 14:41:52.079327
---
> modified 2020-03-05 07:24:39.938326
24c24
< pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31852
lfor 0/21889/21905 flags hashpspool,ec_overwrites,selfmanaged_snaps
stripe_width 16384 target_size_ratio 0.15 application rbd
---
> pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
rjenkins pg_num 1024 pgp_num 999 pg_num_target 256 pgp_num_target 256
autoscale_mode on last_change 31856 lfor 0/21889/21905 flags
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
target_size_ratio 0.15 application rbd
28c28
< pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31659
lfor 0/28686/28688 flags hashpspool,ec_overwrites,selfmanaged_snaps
stripe_width 16384 target_size_ratio 0.15 application rbd
---
> pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
rjenkins pg_num 1024 pgp_num 1024 pg_num_target 256 pgp_num_target 256
autoscale_mode on last_change 31856 lfor 0/28686/28688 flags
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
target_size_ratio 0.15 application rbd
181d180
< blacklist 141.26.152.64:0/3433151139 expires 2020-03-04 15:16:25.964333

Rainer

Am 05.03.20 um 10:19 schrieb Dan van der Ster:
> Hi,
> 
> There was movement already before you rebooted the node at 07:24:41.598004.
> That tells me that it was a ceph-mon process that restarted and either
> trimmed some upmaps or something similar.
> 
> You can do this to see exactly what changed:
> 
> # ceph osd getmap -o 31853 31853   # this is a guess -- pick an osdmap
> epoch that was just before you upgraded.
> # ceph osd getmap -o 31856 31856
> # diff <(osdmaptool --print 31853) <(osdmaptool --print 31856)
> 
> -- dan
> 
> 
> 
> On Thu, Mar 5, 2020 at 10:05 AM Rainer Krienke  wrote:
>>
>> Hello,
>>
>> before I ran the update to 14.2.8 I checked that the state was healthy
>> with all OSDs up and in. I still have the command history I typed
>> visible in my kde terminal buffer and there I see that after the update
>> but before the reboot I ran a ceph -s and there were 144 osd's up and in
>> the state was HEALTH_OK.
>>
>> Could it be of interest that the node rebooted was a monitor node and
>> should mon_osd_down_out_interval at least in theory have prevented what
>> happened to my cluster?
>>
>> Thanks
>> Rainer
>>
>> Am 05.03.20 um 09:49 schrieb Dan van der Ster:
>>> Did you have `144 total, 144 up, 144 in` also before the upgrade?
>>> If an osd was out, then you upgraded/restarted and it went back in, it
>>> would trigger data movement.
>>> (I usually set noin before an upgrade).
>>>
>>> -- dan
>>>
>>> On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke  
>>> wrote:

 I found some information in ceph.log that might help to find out what
 happened. node2  was the one I rebooted:

 2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
 scrub starts
 2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
 scrub ok
 2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
 osdmap e31855: 144 total, 144 up, 144 in
 2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
 osdmap e31856: 144 total, 144 up, 144 in
 2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
 Health check failed: 1 pools have many more objects per pg than average
 (MANY_OBJECTS_PER_PG)
 2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
 osdmap e31857: 144 total, 144 up, 144 in
 2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
 osdmap e31858: 144 total, 144 up, 144 in
 2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
 36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 31854'297918
 2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
 36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 31854'297918
 2020-03-05 07:24:41.631869 osd.127 (osd.127) 693 : cluster [DBG]
 36.3eds0 starting backfill to osd.65(2) from (0'0,0'0] MAX to 31854'297918
 2020-03-05 07:24:41.644089 osd.127 (osd.127) 694 : cluster [DBG]
 36.3eds0 starting backfill to osd.97(3) from (0'0,0'0] MAX to 31854'297918
 2020-03-05 07:24:41.656223 osd.127 (osd.127) 695 : cluster [DBG]
 36.3eds0 starting backfill to osd.122(0) from (0'0,0'0] MAX to 31854'297918
 2020-03-05 07:24:41.669265 osd.127 (osd.127) 696 : cluster [DBG]
 36.3eds0 starting backfill to osd.134(1) from (0'0,0'0] MAX to 31854'297918
 2020-03-05 07:24:41.582485 osd.69 (osd.69) 549 : cluster [DBG] 36.3fes0
 star

[ceph-users] Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
Hi all,

There's something broken in our env when we try to add new mons to
existing clusters, confirmed on two clusters running mimic and
nautilus. It's basically this issue
https://tracker.ceph.com/issues/42830

In case something is wrong with our puppet manifests, I'm trying to
doing it manually.

First we --mkfs the mon and start it, but as soon as the new mon
starts synchronizing, the existing leader becomes unresponsive and an
election is triggered.

Here's exactly what I'm doing:

# cd /var/lib/ceph/tmp/
# scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4
# ceph mon getmap -o monmap
# ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
keyring.mon.cephmon4 --setuser ceph --setgroup ceph
# vi /etc/ceph/ceph.conf 
[mon.cephmon4]
host = cephmon4
mon addr = a.b.c.d:6790
# systemctl start ceph-mon@cephmon4

The log file on the new mon shows it start synchronizing, then
immediately the CPU usage on the leader goes to 100% and elections
start happening, and ceph health shows mon slow ops. perf top of the
ceph-mon with 100% CPU is shown below [1].
On a small nautilus cluster, the new mon gets added withing a minute
or so (but not cleanly -- the leader is unresponsive for quite awhile
until the new mon joins). debug_mon=20 on the leader doesn't show
anything very interesting.
On our large mimic cluster we tried waiting more than 10 minutes --
suffering through several mon elections and 100% usage bouncing around
between leaders -- until we gave up.

I'm pulling my hair out a bit on this -- it's really weird!

Did anyone add a new mon to an existing large cluster recently, and it
went smoothly?

Cheers, Dan

[1]

  15.12%  ceph-mon [.]
MonitorDBStore::Transaction::encode
   8.95%  libceph-common.so.0  [.]
ceph::buffer::v14_2_0::ptr::append
   8.68%  libceph-common.so.0  [.]
ceph::buffer::v14_2_0::list::append
   7.69%  libceph-common.so.0  [.]
ceph::buffer::v14_2_0::ptr::release
   5.86%  libceph-common.so.0  [.]
ceph::buffer::v14_2_0::ptr::ptr
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Ahh that's it! You have `autoscale_mode on` for the pool, and in
14.2.8 there was a fix to calculating how many PGs are needed in an
erasure coded pool:

https://github.com/ceph/ceph/commit/0253205ef36acc6759a3a9687c5eb1b27aa901bf

So at the moment your PGs are merging.

If you want to stop that change, then set autoscale_mode to off or
warn for the relevant pools, then set the pg_num back to the current
(1024).

-- Dan

On Thu, Mar 5, 2020 at 11:19 AM Rainer Krienke  wrote:
>
> The difference was not a big one and consists in a change in pgp_num for
> a pool named pxa-ec froom 1024 to 999. All OSDs were up in the last map
> (31856) :
>
> # diff 31853.txt 31856.txt
> 1c1
> < epoch 31853
> ---
> > epoch 31856
> 4c4
> < modified 2020-03-04 14:41:52.079327
> ---
> > modified 2020-03-05 07:24:39.938326
> 24c24
> < pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
> rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31852
> lfor 0/21889/21905 flags hashpspool,ec_overwrites,selfmanaged_snaps
> stripe_width 16384 target_size_ratio 0.15 application rbd
> ---
> > pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
> rjenkins pg_num 1024 pgp_num 999 pg_num_target 256 pgp_num_target 256
> autoscale_mode on last_change 31856 lfor 0/21889/21905 flags
> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
> target_size_ratio 0.15 application rbd
> 28c28
> < pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
> rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31659
> lfor 0/28686/28688 flags hashpspool,ec_overwrites,selfmanaged_snaps
> stripe_width 16384 target_size_ratio 0.15 application rbd
> ---
> > pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
> rjenkins pg_num 1024 pgp_num 1024 pg_num_target 256 pgp_num_target 256
> autoscale_mode on last_change 31856 lfor 0/28686/28688 flags
> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
> target_size_ratio 0.15 application rbd
> 181d180
> < blacklist 141.26.152.64:0/3433151139 expires 2020-03-04 15:16:25.964333
>
> Rainer
>
> Am 05.03.20 um 10:19 schrieb Dan van der Ster:
> > Hi,
> >
> > There was movement already before you rebooted the node at 07:24:41.598004.
> > That tells me that it was a ceph-mon process that restarted and either
> > trimmed some upmaps or something similar.
> >
> > You can do this to see exactly what changed:
> >
> > # ceph osd getmap -o 31853 31853   # this is a guess -- pick an osdmap
> > epoch that was just before you upgraded.
> > # ceph osd getmap -o 31856 31856
> > # diff <(osdmaptool --print 31853) <(osdmaptool --print 31856)
> >
> > -- dan
> >
> >
> >
> > On Thu, Mar 5, 2020 at 10:05 AM Rainer Krienke  
> > wrote:
> >>
> >> Hello,
> >>
> >> before I ran the update to 14.2.8 I checked that the state was healthy
> >> with all OSDs up and in. I still have the command history I typed
> >> visible in my kde terminal buffer and there I see that after the update
> >> but before the reboot I ran a ceph -s and there were 144 osd's up and in
> >> the state was HEALTH_OK.
> >>
> >> Could it be of interest that the node rebooted was a monitor node and
> >> should mon_osd_down_out_interval at least in theory have prevented what
> >> happened to my cluster?
> >>
> >> Thanks
> >> Rainer
> >>
> >> Am 05.03.20 um 09:49 schrieb Dan van der Ster:
> >>> Did you have `144 total, 144 up, 144 in` also before the upgrade?
> >>> If an osd was out, then you upgraded/restarted and it went back in, it
> >>> would trigger data movement.
> >>> (I usually set noin before an upgrade).
> >>>
> >>> -- dan
> >>>
> >>> On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke  
> >>> wrote:
> 
>  I found some information in ceph.log that might help to find out what
>  happened. node2  was the one I rebooted:
> 
>  2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
>  scrub starts
>  2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
>  scrub ok
>  2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
>  osdmap e31855: 144 total, 144 up, 144 in
>  2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
>  osdmap e31856: 144 total, 144 up, 144 in
>  2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
>  Health check failed: 1 pools have many more objects per pg than average
>  (MANY_OBJECTS_PER_PG)
>  2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
>  osdmap e31857: 144 total, 144 up, 144 in
>  2020-03-05 07:24:41.550395 mon.node2 (mon.0) 692728 : cluster [DBG]
>  osdmap e31858: 144 total, 144 up, 144 in
>  2020-03-05 07:24:41.598004 osd.127 (osd.127) 691 : cluster [DBG]
>  36.3eds0 starting backfill to osd.18(4) from (0'0,0'0] MAX to 
>  31854'297918
>  2020-03-05 07:24:41.619293 osd.127 (osd.127) 692 : cluster [DBG]
>  36.3eds0 starting backfill to osd.49(5) from (0'0,0'0] MAX to 

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
Ok this seems to makes sense.

At the moment the cluster is still busy hnadling misplaced objects, but
when its done, I will set autoscale to "warn"
and also set the no...-Flags and then try to upgrade the next monitor
and see if this works smoother.

Thank you very much for yout help. I learned a lot following your proposals.

Rainer

Am 05.03.20 um 11:45 schrieb Dan van der Ster:
> Ahh that's it! You have `autoscale_mode on` for the pool, and in
> 14.2.8 there was a fix to calculating how many PGs are needed in an
> erasure coded pool:
> 
> https://github.com/ceph/ceph/commit/0253205ef36acc6759a3a9687c5eb1b27aa901bf
> 
> So at the moment your PGs are merging.
> 
> If you want to stop that change, then set autoscale_mode to off or
> warn for the relevant pools, then set the pg_num back to the current
> (1024).
> 
> -- Dan
> 
> On Thu, Mar 5, 2020 at 11:19 AM Rainer Krienke  wrote:
>>
>> The difference was not a big one and consists in a change in pgp_num for
>> a pool named pxa-ec froom 1024 to 999. All OSDs were up in the last map
>> (31856) :
>>
>> # diff 31853.txt 31856.txt
>> 1c1
>> < epoch 31853
>> ---
>>> epoch 31856
>> 4c4
>> < modified 2020-03-04 14:41:52.079327
>> ---
>>> modified 2020-03-05 07:24:39.938326
>> 24c24
>> < pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31852
>> lfor 0/21889/21905 flags hashpspool,ec_overwrites,selfmanaged_snaps
>> stripe_width 16384 target_size_ratio 0.15 application rbd
>> ---
>>> pool 36 'pxa-ec' erasure size 6 min_size 5 crush_rule 7 object_hash
>> rjenkins pg_num 1024 pgp_num 999 pg_num_target 256 pgp_num_target 256
>> autoscale_mode on last_change 31856 lfor 0/21889/21905 flags
>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
>> target_size_ratio 0.15 application rbd
>> 28c28
>> < pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 31659
>> lfor 0/28686/28688 flags hashpspool,ec_overwrites,selfmanaged_snaps
>> stripe_width 16384 target_size_ratio 0.15 application rbd
>> ---
>>> pool 39 'pxb-ec' erasure size 6 min_size 5 crush_rule 3 object_hash
>> rjenkins pg_num 1024 pgp_num 1024 pg_num_target 256 pgp_num_target 256
>> autoscale_mode on last_change 31856 lfor 0/28686/28688 flags
>> hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384
>> target_size_ratio 0.15 application rbd
>> 181d180
>> < blacklist 141.26.152.64:0/3433151139 expires 2020-03-04 15:16:25.964333
>>
>> Rainer
>>
>> Am 05.03.20 um 10:19 schrieb Dan van der Ster:
>>> Hi,
>>>
>>> There was movement already before you rebooted the node at 07:24:41.598004.
>>> That tells me that it was a ceph-mon process that restarted and either
>>> trimmed some upmaps or something similar.
>>>
>>> You can do this to see exactly what changed:
>>>
>>> # ceph osd getmap -o 31853 31853   # this is a guess -- pick an osdmap
>>> epoch that was just before you upgraded.
>>> # ceph osd getmap -o 31856 31856
>>> # diff <(osdmaptool --print 31853) <(osdmaptool --print 31856)
>>>
>>> -- dan
>>>
>>>
>>>
>>> On Thu, Mar 5, 2020 at 10:05 AM Rainer Krienke  
>>> wrote:

 Hello,

 before I ran the update to 14.2.8 I checked that the state was healthy
 with all OSDs up and in. I still have the command history I typed
 visible in my kde terminal buffer and there I see that after the update
 but before the reboot I ran a ceph -s and there were 144 osd's up and in
 the state was HEALTH_OK.

 Could it be of interest that the node rebooted was a monitor node and
 should mon_osd_down_out_interval at least in theory have prevented what
 happened to my cluster?

 Thanks
 Rainer

 Am 05.03.20 um 09:49 schrieb Dan van der Ster:
> Did you have `144 total, 144 up, 144 in` also before the upgrade?
> If an osd was out, then you upgraded/restarted and it went back in, it
> would trigger data movement.
> (I usually set noin before an upgrade).
>
> -- dan
>
> On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke  
> wrote:
>>
>> I found some information in ceph.log that might help to find out what
>> happened. node2  was the one I rebooted:
>>
>> 2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323
>> scrub starts
>> 2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323
>> scrub ok
>> 2020-03-05 07:24:38.948404 mon.node2 (mon.0) 692706 : cluster [DBG]
>> osdmap e31855: 144 total, 144 up, 144 in
>> 2020-03-05 07:24:39.969404 mon.node2 (mon.0) 692713 : cluster [DBG]
>> osdmap e31856: 144 total, 144 up, 144 in
>> 2020-03-05 07:24:39.979238 mon.node2 (mon.0) 692714 : cluster [WRN]
>> Health check failed: 1 pools have many more objects per pg than average
>> (MANY_OBJECTS_PER_PG)
>> 2020-03-05 07:24:40.533392 mon.node2 (mon.0) 692717 : cluster [DBG]
>> osdmap e31857:

[ceph-users] Re: How can I fix "object unfound" error?

2020-03-05 Thread Simone Lazzaris
In data mercoledì 4 marzo 2020 18:14:31 CET, Chad William Seys ha scritto:
> > Maybe I've marked the object as "lost" and removed the failed
> > OSD.
> > 
> > The cluster now is healthy, but I'd like to understand if it's likely
> > to bother me again in the future.
> 
> Yeah, I don't know.
> 
> Within the last month there are 4 separate instances of people
> mentioning "unfound" object in their cluster.
> 
> I'm deferring as long as possible any OSD drive upgrades.  I ran into
> the problem when "draining" an OSD.
> 
> "draining" means remove OSD from crush map, wait for all PG to be stored
> elsewhere, then replace drive with larger one.  Under those
> circumstances there should be no PG unfound.
> 
> BTW, are you using cache tiering ?  The bug report mentions this, but
> some people did not have this enabled.
> 
> Chad.

No, I don't have cache tiering enabled. I also found strange that the PG was 
marked 
unfound: the cluster was perfectly healthy before the kernel panic and a single 
OSD failure 
shouldn't create mush hassle.


*Simone Lazzaris*
*Qcom S.p.A. a Socio Unico*
 

Via Roggia Vignola, 9 | 24047 Treviglio (BG)T +39 0363 1970352 | M +39 
3938111237

simone.lazza...@qcom.it[1] | www.qcom.it[2]
* LinkedIn[3]* | *Facebook*[4]
[5] 




[1] mailto:simone.lazza...@qcom.it
[2] https://www.qcom.it
[3] https://www.linkedin.com/company/qcom-spa
[4] http://www.facebook.com/qcomspa
[5] https://www.qcom.it/includes/NUOVAemail-banner.gif
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> Hi all,
> 
> There's something broken in our env when we try to add new mons to
> existing clusters, confirmed on two clusters running mimic and
> nautilus. It's basically this issue
> https://tracker.ceph.com/issues/42830
> 
> In case something is wrong with our puppet manifests, I'm trying to
> doing it manually.
> 
> First we --mkfs the mon and start it, but as soon as the new mon
> starts synchronizing, the existing leader becomes unresponsive and an
> election is triggered.
> 
> Here's exactly what I'm doing:
> 
> # cd /var/lib/ceph/tmp/
> # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4
> # ceph mon getmap -o monmap
> # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> # vi /etc/ceph/ceph.conf 
> [mon.cephmon4]
> host = cephmon4
> mon addr = a.b.c.d:6790
> # systemctl start ceph-mon@cephmon4
> 
> The log file on the new mon shows it start synchronizing, then
> immediately the CPU usage on the leader goes to 100% and elections
> start happening, and ceph health shows mon slow ops. perf top of the
> ceph-mon with 100% CPU is shown below [1].
> On a small nautilus cluster, the new mon gets added withing a minute
> or so (but not cleanly -- the leader is unresponsive for quite awhile
> until the new mon joins). debug_mon=20 on the leader doesn't show
> anything very interesting.
> On our large mimic cluster we tried waiting more than 10 minutes --
> suffering through several mon elections and 100% usage bouncing around
> between leaders -- until we gave up.
> 
> I'm pulling my hair out a bit on this -- it's really weird!

Can you try running a rocksdb compaction on the existing mons before 
adding the new one and see if that helps?

s

> 
> Did anyone add a new mon to an existing large cluster recently, and it
> went smoothly?
> 
> Cheers, Dan
> 
> [1]
> 
>   15.12%  ceph-mon [.]
> MonitorDBStore::Transaction::encode
>8.95%  libceph-common.so.0  [.]
> ceph::buffer::v14_2_0::ptr::append
>8.68%  libceph-common.so.0  [.]
> ceph::buffer::v14_2_0::list::append
>7.69%  libceph-common.so.0  [.]
> ceph::buffer::v14_2_0::ptr::release
>5.86%  libceph-common.so.0  [.]
> ceph::buffer::v14_2_0::ptr::ptr
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-mon store.db disk usage increase on OSD-Host fail

2020-03-05 Thread Hartwig Hauschild
Hi, 

I'm (still) testing upgrading from Luminous to Nautilus and ran into the
following situation:

The lab-setup I'm testing in has three OSD-Hosts. 
If one of those hosts dies the store.db in /var/lib/ceph/mon/ on all my
Mon-Nodes starts to rapidly grow in size until either the OSD-host comes
back up or disks are full.

On another cluster that's still on Luminous I don't see any growth at all.

Is that a difference in behaviour between Luminous and Nautilus or is that
caused by the lab-setup only having three hosts and one lost host causing
all PGs to be degraded at the same time?


-- 
Cheers,
Hardy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Wido den Hollander



On 3/5/20 3:22 PM, Sage Weil wrote:
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
>> Hi all,
>>
>> There's something broken in our env when we try to add new mons to
>> existing clusters, confirmed on two clusters running mimic and
>> nautilus. It's basically this issue
>> https://tracker.ceph.com/issues/42830
>>
>> In case something is wrong with our puppet manifests, I'm trying to
>> doing it manually.
>>
>> First we --mkfs the mon and start it, but as soon as the new mon
>> starts synchronizing, the existing leader becomes unresponsive and an
>> election is triggered.
>>
>> Here's exactly what I'm doing:
>>
>> # cd /var/lib/ceph/tmp/
>> # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4
>> # ceph mon getmap -o monmap
>> # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
>> keyring.mon.cephmon4 --setuser ceph --setgroup ceph
>> # vi /etc/ceph/ceph.conf 
>> [mon.cephmon4]
>> host = cephmon4
>> mon addr = a.b.c.d:6790
>> # systemctl start ceph-mon@cephmon4
>>
>> The log file on the new mon shows it start synchronizing, then
>> immediately the CPU usage on the leader goes to 100% and elections
>> start happening, and ceph health shows mon slow ops. perf top of the
>> ceph-mon with 100% CPU is shown below [1].
>> On a small nautilus cluster, the new mon gets added withing a minute
>> or so (but not cleanly -- the leader is unresponsive for quite awhile
>> until the new mon joins). debug_mon=20 on the leader doesn't show
>> anything very interesting.
>> On our large mimic cluster we tried waiting more than 10 minutes --
>> suffering through several mon elections and 100% usage bouncing around
>> between leaders -- until we gave up.
>>
>> I'm pulling my hair out a bit on this -- it's really weird!
> 
> Can you try running a rocksdb compaction on the existing mons before 
> adding the new one and see if that helps?

I can chime in here: I had this happen to a customer as well.

Compact did not work.

Some background:

5 Monitors and the DBs were ~350M in size. They upgraded one MON from
13.2.6 to 13.2.8 and that caused one MON (sync source) to eat 100% CPU.

The logs showed that the upgraded MON (which was restarted) was in the
synchronizing state.

Because they had 5 MONs they now had 3 left so the cluster kept running.

I left this for about 5 minutes, but it never synced.

I tried a compact, didn't work either.

Eventually I stopped one MON, tarballed it's database and used that to
bring back the MON which was upgraded to 13.2.8

That work without any hickups. The MON joined again within a few seconds.

Wido

> 
> s
> 
>>
>> Did anyone add a new mon to an existing large cluster recently, and it
>> went smoothly?
>>
>> Cheers, Dan
>>
>> [1]
>>
>>   15.12%  ceph-mon [.]
>> MonitorDBStore::Transaction::encode
>>8.95%  libceph-common.so.0  [.]
>> ceph::buffer::v14_2_0::ptr::append
>>8.68%  libceph-common.so.0  [.]
>> ceph::buffer::v14_2_0::list::append
>>7.69%  libceph-common.so.0  [.]
>> ceph::buffer::v14_2_0::ptr::release
>>5.86%  libceph-common.so.0  [.]
>> ceph::buffer::v14_2_0::ptr::ptr
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fw: Incompatibilities (implicit_tenants & barbican) with Openstack after migrating from Ceph Luminous to Nautilus.

2020-03-05 Thread Casey Bodley



On 3/3/20 2:33 PM, Scheurer François wrote:


/(resending to the new maillist)/


Dear Casey, Dear All,



We tested the migration from Luminous to Nautilus and noticed two 
regressions breaking the RGW integration in Openstack:







1)  the following config parameter is not working on Nautilus but is 
valid on Luminous and on Master:

    rgw_keystone_implicit_tenants = swift

    In the log: parse error setting 
'rgw_keystone_implicit_tenants' to 'swift' (Expected option value to 
be integer, got 'swift')


    This param is important to make RGW working for S3 and Swift.
    Setting it to false breaks swift/openstack and setting it to true 
makes S3 incompatible with dns-style bucketnames (with shared or 
public access).
    Please note that path-style bucketnames are deprecated by AWS and 
most clients are only supporting dns-style...


    Ref.:
https://tracker.ceph.com/issues/24348 


https://github.com/ceph/ceph/commit/3ba7be8d1ac7ee43e69eebb58263cd080cca1d38


Ok, wow. It looks like this commit was backported to luminous in 
https://github.com/ceph/ceph/pull/22363 over a year before it actually 
merged to master as part of https://github.com/ceph/ceph/pull/28813, so 
missed the mimic and nautilus releases. I prepared those backports in 
https://tracker.ceph.com/issues/5 and 
https://tracker.ceph.com/issues/4.








2) the server-side encryption (SSE-KMS) is broken on Nautilus:

    to reproduce the issue:
    s3cmd --access_key $ACCESSKEY --secret_key $SECRETKEY 
--host-bucket "%(bucket)s.$ENDPOINT" --host "$ENDPOINT" 
--region="$REGION" --signature-v2 --no-preserve --no-ssl 
--server-side-encryption --server-side-encryption-kms-id ${SECRET##*/} 
put helloenc.txt s3://testenc/


    output:
    upload: 'helloenc.txt' -> 's3://testenc/helloenc.txt'  [1 
of 1]

    9 of 9   100% in    0s    37.50 B/s  done
    ERROR: S3 error: 403 (AccessDenied): Failed to retrieve 
the actual key, kms-keyid: cd0903db-c613-49be-96d9-165c02544bc7

    rgw log: see below


    TLDR: after investigating, I found that radosgw was actually 
getting the barbican secret correctly but the HTTP CODE (=200) 
validation was failing because of a bug in Nautilus.


    My understanding is following (please correct me):

    The bug in src/rgw/rgw_http_client.cc .

    Since Nautilus HTTP_CODE are converted into ERROR_CODE (200 
becomes 0) in the request processing.
    This happens in RGWHTTPManager::reqs_thread_entry(), which 
centralizes the processing of (curl) HTTP Requests with multi-treading.


    This is fine but the member variable http_status of the class 
RGWHTTPClient is not updated with the resulting HTTP CODE, so the 
variable keeps its initial value of 0.


    Then in src/rgw/rgw_crypt.cc the logic is still verifying that 
http_status is in range [200,299] and this fails...


    I wrote the following oneliner bugfix for 
src/rgw/rgw_http_client.cc:


    diff --git a/src/rgw/rgw_http_client.cc 
b/src/rgw/rgw_http_client.cc

    index d0f0baead6..7c115293ad 100644
    --- a/src/rgw/rgw_http_client.cc
    +++ b/src/rgw/rgw_http_client.cc
    @@ -1146,6 +1146,7 @@ void 
*RGWHTTPManager::reqs_thread_entry()

   status = -EAGAIN;
 }
 int id = req_data->id;
    + req_data->client->http_status = http_status;
    finish_request(req_data, status);
 switch (result) {
   case CURLE_OK:

    The s3cmd is then working fine with KMS server side encryption.




Thanks. This one was also fixed on master in 
https://github.com/ceph/ceph/pull/29639 but didn't get backports. I 
opened https://tracker.ceph.com/issues/3 to track those for mimic 
and nautilus.





Questions:

  *     Could someone please write a fix for the regression of 1) and
make a PR ?
  *     Could somebody also make a PR for 2?



Thank you for your help. :-)



Cheers
Francois Scheurer


rgw log:
    export CLUSTER=ceph; /home/local/ceph/build/bin/radosgw -f 
--cluster ${CLUSTER} --name client.rgw.$(hostname) --setuser ceph 
--setgroup ceph &
    tail -fn0 /var/log/ceph/ceph-client.rgw.ewos1-osd1-stage.log | 
less -IS
    2020-02-26 16:32:59.208 7fc1f1c54700 20 Getting KMS 
encryption key for key=cd0903db-c613-49be-96d9-165c02544bc7
    2020-02-26 16:32:59.208 7fc1f1c54700 20 Requesting secret 
from barbican 
url=http://keystone.service.stage.i.ewcs.ch:5000/v3/auth/tokens
    2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug: 
RGWHTTPClient::process: http_status: 0
    2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug: 
RGWHTTP::process

    2020-02-26 16:32:59.208 7fc1f1c54700 20 ewdebug: RGWHTTP::send
    2020-02-26 16:32:59.208 7fc1f1c54700 20 sending request to 
http://keystone.service.stage.i.ewcs.ch:5000/v3/aut

[ceph-users] Re: Fw: Incompatibilities (implicit_tenants & barbican) with Openstack after migrating from Ceph Luminous to Nautilus.

2020-03-05 Thread Scheurer François
Dear Casey

Many thanks  that's great to get your help! 

Cheers
Francois



From: Casey Bodley 
Sent: Thursday, March 5, 2020 3:57 PM
To: Scheurer François; ceph-users@ceph.io
Cc: Engelmann Florian; Rafael Weingärtner
Subject: Re: Fw: Incompatibilities (implicit_tenants & barbican) with Openstack 
after migrating from Ceph Luminous to Nautilus.

On 3/3/20 2:33 PM, Scheurer François wrote:
>
> /(resending to the new maillist)/
>
>
> Dear Casey, Dear All,
>
>
>
> We tested the migration from Luminous to Nautilus and noticed two
> regressions breaking the RGW integration in Openstack:
>
>
>
>
>
>
> 1)  the following config parameter is not working on Nautilus but is
> valid on Luminous and on Master:
> rgw_keystone_implicit_tenants = swift
>
> In the log: parse error setting
> 'rgw_keystone_implicit_tenants' to 'swift' (Expected option value to
> be integer, got 'swift')
>
> This param is important to make RGW working for S3 and Swift.
> Setting it to false breaks swift/openstack and setting it to true
> makes S3 incompatible with dns-style bucketnames (with shared or
> public access).
> Please note that path-style bucketnames are deprecated by AWS and
> most clients are only supporting dns-style...
>
> Ref.:
> https://tracker.ceph.com/issues/24348
> 
> https://github.com/ceph/ceph/commit/3ba7be8d1ac7ee43e69eebb58263cd080cca1d38
>
>
Ok, wow. It looks like this commit was backported to luminous in
https://github.com/ceph/ceph/pull/22363 over a year before it actually
merged to master as part of https://github.com/ceph/ceph/pull/28813, so
missed the mimic and nautilus releases. I prepared those backports in
https://tracker.ceph.com/issues/5 and
https://tracker.ceph.com/issues/4.


>
>
>
>
> 2) the server-side encryption (SSE-KMS) is broken on Nautilus:
>
> to reproduce the issue:
> s3cmd --access_key $ACCESSKEY --secret_key $SECRETKEY
> --host-bucket "%(bucket)s.$ENDPOINT" --host "$ENDPOINT"
> --region="$REGION" --signature-v2 --no-preserve --no-ssl
> --server-side-encryption --server-side-encryption-kms-id ${SECRET##*/}
> put helloenc.txt s3://testenc/
>
> output:
> upload: 'helloenc.txt' -> 's3://testenc/helloenc.txt'  [1
> of 1]
> 9 of 9   100% in0s37.50 B/s  done
> ERROR: S3 error: 403 (AccessDenied): Failed to retrieve
> the actual key, kms-keyid: cd0903db-c613-49be-96d9-165c02544bc7
> rgw log: see below
>
>
> TLDR: after investigating, I found that radosgw was actually
> getting the barbican secret correctly but the HTTP CODE (=200)
> validation was failing because of a bug in Nautilus.
>
> My understanding is following (please correct me):
>
> The bug in src/rgw/rgw_http_client.cc .
>
> Since Nautilus HTTP_CODE are converted into ERROR_CODE (200
> becomes 0) in the request processing.
> This happens in RGWHTTPManager::reqs_thread_entry(), which
> centralizes the processing of (curl) HTTP Requests with multi-treading.
>
> This is fine but the member variable http_status of the class
> RGWHTTPClient is not updated with the resulting HTTP CODE, so the
> variable keeps its initial value of 0.
>
> Then in src/rgw/rgw_crypt.cc the logic is still verifying that
> http_status is in range [200,299] and this fails...
>
> I wrote the following oneliner bugfix for
> src/rgw/rgw_http_client.cc:
>
> diff --git a/src/rgw/rgw_http_client.cc
> b/src/rgw/rgw_http_client.cc
> index d0f0baead6..7c115293ad 100644
> --- a/src/rgw/rgw_http_client.cc
> +++ b/src/rgw/rgw_http_client.cc
> @@ -1146,6 +1146,7 @@ void
> *RGWHTTPManager::reqs_thread_entry()
>status = -EAGAIN;
>  }
>  int id = req_data->id;
> + req_data->client->http_status = http_status;
> finish_request(req_data, status);
>  switch (result) {
>case CURLE_OK:
>
> The s3cmd is then working fine with KMS server side encryption.
>
>
>
>
Thanks. This one was also fixed on master in
https://github.com/ceph/ceph/pull/29639 but didn't get backports. I
opened https://tracker.ceph.com/issues/3 to track those for mimic
and nautilus.

>
>
> Questions:
>
>   * Could someone please write a fix for the regression of 1) and
> make a PR ?
>   * Could somebody also make a PR for 2?
>
>
>
> Thank you for your help. :-)
>
>
>
> Cheers
> Francois Scheurer
>
>
> rgw log:
> export CLUSTER=ceph; /home/local/ceph/build/bin/radosgw -f
> --cluster ${CLUSTER} --name client.rgw.$(hostname) --setuser ceph
> --setgroup ceph &
> tail -fn0 /var/log/ceph/ceph-client.rgw.ewos1-osd1-stage.log |
> less -IS
> 2020-02-26 16:32:59.208 7fc1f1c54700 20 Getting KMS
> encryption key for key=cd0903db-c613-49be-96d9-165c

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
Hi Sage,

On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > Hi all,
> >
> > There's something broken in our env when we try to add new mons to
> > existing clusters, confirmed on two clusters running mimic and
> > nautilus. It's basically this issue
> > https://tracker.ceph.com/issues/42830
> >
> > In case something is wrong with our puppet manifests, I'm trying to
> > doing it manually.
> >
> > First we --mkfs the mon and start it, but as soon as the new mon
> > starts synchronizing, the existing leader becomes unresponsive and an
> > election is triggered.
> >
> > Here's exactly what I'm doing:
> >
> > # cd /var/lib/ceph/tmp/
> > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4
> > # ceph mon getmap -o monmap
> > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > # vi /etc/ceph/ceph.conf 
> > [mon.cephmon4]
> > host = cephmon4
> > mon addr = a.b.c.d:6790
> > # systemctl start ceph-mon@cephmon4
> >
> > The log file on the new mon shows it start synchronizing, then
> > immediately the CPU usage on the leader goes to 100% and elections
> > start happening, and ceph health shows mon slow ops. perf top of the
> > ceph-mon with 100% CPU is shown below [1].
> > On a small nautilus cluster, the new mon gets added withing a minute
> > or so (but not cleanly -- the leader is unresponsive for quite awhile
> > until the new mon joins). debug_mon=20 on the leader doesn't show
> > anything very interesting.
> > On our large mimic cluster we tried waiting more than 10 minutes --
> > suffering through several mon elections and 100% usage bouncing around
> > between leaders -- until we gave up.
> >
> > I'm pulling my hair out a bit on this -- it's really weird!
>
> Can you try running a rocksdb compaction on the existing mons before
> adding the new one and see if that helps?

It doesn't help. I compacted the 3 mons in quorum then started a new
one with debug mon & paxos = 20.

ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83

I stopped that new mon as soon as the sync source started spinning
100% and left the quorum.

-- Dan


>
> s
>
> >
> > Did anyone add a new mon to an existing large cluster recently, and it
> > went smoothly?
> >
> > Cheers, Dan
> >
> > [1]
> >
> >   15.12%  ceph-mon [.]
> > MonitorDBStore::Transaction::encode
> >8.95%  libceph-common.so.0  [.]
> > ceph::buffer::v14_2_0::ptr::append
> >8.68%  libceph-common.so.0  [.]
> > ceph::buffer::v14_2_0::list::append
> >7.69%  libceph-common.so.0  [.]
> > ceph::buffer::v14_2_0::ptr::release
> >5.86%  libceph-common.so.0  [.]
> > ceph::buffer::v14_2_0::ptr::ptr
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 3:31 PM Wido den Hollander  wrote:
>
>
>
> On 3/5/20 3:22 PM, Sage Weil wrote:
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> >> Hi all,
> >>
> >> There's something broken in our env when we try to add new mons to
> >> existing clusters, confirmed on two clusters running mimic and
> >> nautilus. It's basically this issue
> >> https://tracker.ceph.com/issues/42830
> >>
> >> In case something is wrong with our puppet manifests, I'm trying to
> >> doing it manually.
> >>
> >> First we --mkfs the mon and start it, but as soon as the new mon
> >> starts synchronizing, the existing leader becomes unresponsive and an
> >> election is triggered.
> >>
> >> Here's exactly what I'm doing:
> >>
> >> # cd /var/lib/ceph/tmp/
> >> # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4
> >> # ceph mon getmap -o monmap
> >> # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> >> keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> >> # vi /etc/ceph/ceph.conf 
> >> [mon.cephmon4]
> >> host = cephmon4
> >> mon addr = a.b.c.d:6790
> >> # systemctl start ceph-mon@cephmon4
> >>
> >> The log file on the new mon shows it start synchronizing, then
> >> immediately the CPU usage on the leader goes to 100% and elections
> >> start happening, and ceph health shows mon slow ops. perf top of the
> >> ceph-mon with 100% CPU is shown below [1].
> >> On a small nautilus cluster, the new mon gets added withing a minute
> >> or so (but not cleanly -- the leader is unresponsive for quite awhile
> >> until the new mon joins). debug_mon=20 on the leader doesn't show
> >> anything very interesting.
> >> On our large mimic cluster we tried waiting more than 10 minutes --
> >> suffering through several mon elections and 100% usage bouncing around
> >> between leaders -- until we gave up.
> >>
> >> I'm pulling my hair out a bit on this -- it's really weird!
> >
> > Can you try running a rocksdb compaction on the existing mons before
> > adding the new one and see if that helps?
>
> I can chime in here: I had this happen to a customer as well.
>
> Compact did not work.
>
> Some background:
>
> 5 Monitors and the DBs were ~350M in size. They upgraded one MON from
> 13.2.6 to 13.2.8 and that caused one MON (sync source) to eat 100% CPU.
>
> The logs showed that the upgraded MON (which was restarted) was in the
> synchronizing state.
>
> Because they had 5 MONs they now had 3 left so the cluster kept running.
>
> I left this for about 5 minutes, but it never synced.
>
> I tried a compact, didn't work either.
>
> Eventually I stopped one MON, tarballed it's database and used that to
> bring back the MON which was upgraded to 13.2.8

Yeah, that works! -- something like:

ceph mon add  ip:6789
rsync :/var/lib/ceph/mon/ceph.. /var/lib/ceph/mon
systemctl start ceph-mon.target

I guess that's a workaround, but would be good to find out why the
sync source is spinning.

-- dan



>
> That work without any hickups. The MON joined again within a few seconds.
>
> Wido
>
> >
> > s
> >
> >>
> >> Did anyone add a new mon to an existing large cluster recently, and it
> >> went smoothly?
> >>
> >> Cheers, Dan
> >>
> >> [1]
> >>
> >>   15.12%  ceph-mon [.]
> >> MonitorDBStore::Transaction::encode
> >>8.95%  libceph-common.so.0  [.]
> >> ceph::buffer::v14_2_0::ptr::append
> >>8.68%  libceph-common.so.0  [.]
> >> ceph::buffer::v14_2_0::list::append
> >>7.69%  libceph-common.so.0  [.]
> >> ceph::buffer::v14_2_0::ptr::release
> >>5.86%  libceph-common.so.0  [.]
> >> ceph::buffer::v14_2_0::ptr::ptr
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> Hi Sage,
> 
> On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > Hi all,
> > >
> > > There's something broken in our env when we try to add new mons to
> > > existing clusters, confirmed on two clusters running mimic and
> > > nautilus. It's basically this issue
> > > https://tracker.ceph.com/issues/42830
> > >
> > > In case something is wrong with our puppet manifests, I'm trying to
> > > doing it manually.
> > >
> > > First we --mkfs the mon and start it, but as soon as the new mon
> > > starts synchronizing, the existing leader becomes unresponsive and an
> > > election is triggered.
> > >
> > > Here's exactly what I'm doing:
> > >
> > > # cd /var/lib/ceph/tmp/
> > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 keyring.mon.cephmon4
> > > # ceph mon getmap -o monmap
> > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > > # vi /etc/ceph/ceph.conf 
> > > [mon.cephmon4]
> > > host = cephmon4
> > > mon addr = a.b.c.d:6790
> > > # systemctl start ceph-mon@cephmon4
> > >
> > > The log file on the new mon shows it start synchronizing, then
> > > immediately the CPU usage on the leader goes to 100% and elections
> > > start happening, and ceph health shows mon slow ops. perf top of the
> > > ceph-mon with 100% CPU is shown below [1].
> > > On a small nautilus cluster, the new mon gets added withing a minute
> > > or so (but not cleanly -- the leader is unresponsive for quite awhile
> > > until the new mon joins). debug_mon=20 on the leader doesn't show
> > > anything very interesting.
> > > On our large mimic cluster we tried waiting more than 10 minutes --
> > > suffering through several mon elections and 100% usage bouncing around
> > > between leaders -- until we gave up.
> > >
> > > I'm pulling my hair out a bit on this -- it's really weird!
> >
> > Can you try running a rocksdb compaction on the existing mons before
> > adding the new one and see if that helps?
> 
> It doesn't help. I compacted the 3 mons in quorum then started a new
> one with debug mon & paxos = 20.
> 
> ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83
> 
> I stopped that new mon as soon as the sync source started spinning
> 100% and left the quorum.

Can you include the log from teh sync source too?  That's presumably where 
the bug is.

Thanks!
sage

> 
> -- Dan
> 
> 
> >
> > s
> >
> > >
> > > Did anyone add a new mon to an existing large cluster recently, and it
> > > went smoothly?
> > >
> > > Cheers, Dan
> > >
> > > [1]
> > >
> > >   15.12%  ceph-mon [.]
> > > MonitorDBStore::Transaction::encode
> > >8.95%  libceph-common.so.0  [.]
> > > ceph::buffer::v14_2_0::ptr::append
> > >8.68%  libceph-common.so.0  [.]
> > > ceph::buffer::v14_2_0::list::append
> > >7.69%  libceph-common.so.0  [.]
> > > ceph::buffer::v14_2_0::ptr::release
> > >5.86%  libceph-common.so.0  [.]
> > > ceph::buffer::v14_2_0::ptr::ptr
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > >
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error in Telemetry Module

2020-03-05 Thread Lenz Grimmer
On 2020-03-05 04:22, Anthony D'Atri wrote:

>>> The message HEALTH_ERR, in red, on the front of the dashboard, is an
>>> interesting way to start the day. ;)
>>
>> If possible, I'd suggest to change this into a HEALTH_WARN state -
>> heaven is not falling down just because the telemetry module can't reach
>> its server...
> 
> Seconded.  Neither client data integrity nor availability are are risk.

I wonder if this issue already captured this problem:

https://tracker.ceph.com/issues/43963

Patch (master): https://github.com/ceph/ceph/pull/33070 (merged)
Patch (nautilus): https://github.com/ceph/ceph/pull/33141 (pending merge)

Looks like that fix missed the 14.2.8 merge window, so it will hopefully
be included in 14.2.9

Lenz

-- 
SUSE Software Solutions Germany GmbH - Maxfeldstr. 5 - 90409 Nuernberg
GF: Felix Imendörffer, HRB 36809 (AG Nürnberg)



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rbd-mirror - which direction?

2020-03-05 Thread Ml Ml
Hello,

i am running luminous and i would like to back up my cluster from
Site-A to Site-B (one way)

So i decided to mirror it to an off-site ceph cluster.

I read:  https://docs.ceph.com/docs/luminous/rbd/rbd-mirroring/
But i liked I https://github.com/MiracleMa/Blog/issues/2 a little better.

But which parameter/part show the direction it will mirror?

Thanks,
Mario
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I fix "object unfound" error?

2020-03-05 Thread Chad William Seys



No, I don't have cache tiering enabled. I also found strange that the PG 
was marked unfound: the cluster was perfectly healthy before the kernel 
panic and a single OSD failure shouldn't create mush hassle.


Yes, it is a bug unless using a singly replicated pool!

C.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 4:42 PM Sage Weil  wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > Hi Sage,
> >
> > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  wrote:
> > >
> > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > Hi all,
> > > >
> > > > There's something broken in our env when we try to add new mons to
> > > > existing clusters, confirmed on two clusters running mimic and
> > > > nautilus. It's basically this issue
> > > > https://tracker.ceph.com/issues/42830
> > > >
> > > > In case something is wrong with our puppet manifests, I'm trying to
> > > > doing it manually.
> > > >
> > > > First we --mkfs the mon and start it, but as soon as the new mon
> > > > starts synchronizing, the existing leader becomes unresponsive and an
> > > > election is triggered.
> > > >
> > > > Here's exactly what I'm doing:
> > > >
> > > > # cd /var/lib/ceph/tmp/
> > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 
> > > > keyring.mon.cephmon4
> > > > # ceph mon getmap -o monmap
> > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > > > # vi /etc/ceph/ceph.conf 
> > > > [mon.cephmon4]
> > > > host = cephmon4
> > > > mon addr = a.b.c.d:6790
> > > > # systemctl start ceph-mon@cephmon4
> > > >
> > > > The log file on the new mon shows it start synchronizing, then
> > > > immediately the CPU usage on the leader goes to 100% and elections
> > > > start happening, and ceph health shows mon slow ops. perf top of the
> > > > ceph-mon with 100% CPU is shown below [1].
> > > > On a small nautilus cluster, the new mon gets added withing a minute
> > > > or so (but not cleanly -- the leader is unresponsive for quite awhile
> > > > until the new mon joins). debug_mon=20 on the leader doesn't show
> > > > anything very interesting.
> > > > On our large mimic cluster we tried waiting more than 10 minutes --
> > > > suffering through several mon elections and 100% usage bouncing around
> > > > between leaders -- until we gave up.
> > > >
> > > > I'm pulling my hair out a bit on this -- it's really weird!
> > >
> > > Can you try running a rocksdb compaction on the existing mons before
> > > adding the new one and see if that helps?
> >
> > It doesn't help. I compacted the 3 mons in quorum then started a new
> > one with debug mon & paxos = 20.
> >
> > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83
> >
> > I stopped that new mon as soon as the sync source started spinning
> > 100% and left the quorum.
>
> Can you include the log from teh sync source too?  That's presumably where
> the bug is.

Here's a different new mon and the leader, with debug_paxos & mon = 20:

ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055

Things start to go wrong at this line:

2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader)
e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2

...which is the just before it tries to sync osd_snap.

I also included the output of ceph-monstore-tool dump-keys. There are
really a lot of osd_snap keys!

Thanks!

-- dan

>
> Thanks!
> sage
>
> >
> > -- Dan
> >
> >
> > >
> > > s
> > >
> > > >
> > > > Did anyone add a new mon to an existing large cluster recently, and it
> > > > went smoothly?
> > > >
> > > > Cheers, Dan
> > > >
> > > > [1]
> > > >
> > > >   15.12%  ceph-mon [.]
> > > > MonitorDBStore::Transaction::encode
> > > >8.95%  libceph-common.so.0  [.]
> > > > ceph::buffer::v14_2_0::ptr::append
> > > >8.68%  libceph-common.so.0  [.]
> > > > ceph::buffer::v14_2_0::list::append
> > > >7.69%  libceph-common.so.0  [.]
> > > > ceph::buffer::v14_2_0::ptr::release
> > > >5.86%  libceph-common.so.0  [.]
> > > > ceph::buffer::v14_2_0::ptr::ptr
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> > > >
> >
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> On Thu, Mar 5, 2020 at 4:42 PM Sage Weil  wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > Hi Sage,
> > >
> > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  wrote:
> > > >
> > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > Hi all,
> > > > >
> > > > > There's something broken in our env when we try to add new mons to
> > > > > existing clusters, confirmed on two clusters running mimic and
> > > > > nautilus. It's basically this issue
> > > > > https://tracker.ceph.com/issues/42830
> > > > >
> > > > > In case something is wrong with our puppet manifests, I'm trying to
> > > > > doing it manually.
> > > > >
> > > > > First we --mkfs the mon and start it, but as soon as the new mon
> > > > > starts synchronizing, the existing leader becomes unresponsive and an
> > > > > election is triggered.
> > > > >
> > > > > Here's exactly what I'm doing:
> > > > >
> > > > > # cd /var/lib/ceph/tmp/
> > > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 
> > > > > keyring.mon.cephmon4
> > > > > # ceph mon getmap -o monmap
> > > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > > > > # vi /etc/ceph/ceph.conf 
> > > > > [mon.cephmon4]
> > > > > host = cephmon4
> > > > > mon addr = a.b.c.d:6790
> > > > > # systemctl start ceph-mon@cephmon4
> > > > >
> > > > > The log file on the new mon shows it start synchronizing, then
> > > > > immediately the CPU usage on the leader goes to 100% and elections
> > > > > start happening, and ceph health shows mon slow ops. perf top of the
> > > > > ceph-mon with 100% CPU is shown below [1].
> > > > > On a small nautilus cluster, the new mon gets added withing a minute
> > > > > or so (but not cleanly -- the leader is unresponsive for quite awhile
> > > > > until the new mon joins). debug_mon=20 on the leader doesn't show
> > > > > anything very interesting.
> > > > > On our large mimic cluster we tried waiting more than 10 minutes --
> > > > > suffering through several mon elections and 100% usage bouncing around
> > > > > between leaders -- until we gave up.
> > > > >
> > > > > I'm pulling my hair out a bit on this -- it's really weird!
> > > >
> > > > Can you try running a rocksdb compaction on the existing mons before
> > > > adding the new one and see if that helps?
> > >
> > > It doesn't help. I compacted the 3 mons in quorum then started a new
> > > one with debug mon & paxos = 20.
> > >
> > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83
> > >
> > > I stopped that new mon as soon as the sync source started spinning
> > > 100% and left the quorum.
> >
> > Can you include the log from teh sync source too?  That's presumably where
> > the bug is.
> 
> Here's a different new mon and the leader, with debug_paxos & mon = 20:
> 
> ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055
> 
> Things start to go wrong at this line:
> 
> 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader)
> e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2
> 
> ...which is the just before it tries to sync osd_snap.
> 
> I also included the output of ceph-monstore-tool dump-keys. There are
> really a lot of osd_snap keys!

Aha, I knew this sounded familiar! See 
https://github.com/ceph/ceph/pull/31581

We should backport this for the next nautilus...

sage


> 
> Thanks!
> 
> -- dan
> 
> >
> > Thanks!
> > sage
> >
> > >
> > > -- Dan
> > >
> > >
> > > >
> > > > s
> > > >
> > > > >
> > > > > Did anyone add a new mon to an existing large cluster recently, and it
> > > > > went smoothly?
> > > > >
> > > > > Cheers, Dan
> > > > >
> > > > > [1]
> > > > >
> > > > >   15.12%  ceph-mon [.]
> > > > > MonitorDBStore::Transaction::encode
> > > > >8.95%  libceph-common.so.0  [.]
> > > > > ceph::buffer::v14_2_0::ptr::append
> > > > >8.68%  libceph-common.so.0  [.]
> > > > > ceph::buffer::v14_2_0::list::append
> > > > >7.69%  libceph-common.so.0  [.]
> > > > > ceph::buffer::v14_2_0::ptr::release
> > > > >5.86%  libceph-common.so.0  [.]
> > > > > ceph::buffer::v14_2_0::ptr::ptr
> > > > > ___
> > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > >
> > > > >
> > >
> > >
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 8:05 PM Sage Weil  wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil  wrote:
> > >
> > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > Hi Sage,
> > > >
> > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  wrote:
> > > > >
> > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > > Hi all,
> > > > > >
> > > > > > There's something broken in our env when we try to add new mons to
> > > > > > existing clusters, confirmed on two clusters running mimic and
> > > > > > nautilus. It's basically this issue
> > > > > > https://tracker.ceph.com/issues/42830
> > > > > >
> > > > > > In case something is wrong with our puppet manifests, I'm trying to
> > > > > > doing it manually.
> > > > > >
> > > > > > First we --mkfs the mon and start it, but as soon as the new mon
> > > > > > starts synchronizing, the existing leader becomes unresponsive and 
> > > > > > an
> > > > > > election is triggered.
> > > > > >
> > > > > > Here's exactly what I'm doing:
> > > > > >
> > > > > > # cd /var/lib/ceph/tmp/
> > > > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 
> > > > > > keyring.mon.cephmon4
> > > > > > # ceph mon getmap -o monmap
> > > > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > > > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > > > > > # vi /etc/ceph/ceph.conf 
> > > > > > [mon.cephmon4]
> > > > > > host = cephmon4
> > > > > > mon addr = a.b.c.d:6790
> > > > > > # systemctl start ceph-mon@cephmon4
> > > > > >
> > > > > > The log file on the new mon shows it start synchronizing, then
> > > > > > immediately the CPU usage on the leader goes to 100% and elections
> > > > > > start happening, and ceph health shows mon slow ops. perf top of the
> > > > > > ceph-mon with 100% CPU is shown below [1].
> > > > > > On a small nautilus cluster, the new mon gets added withing a minute
> > > > > > or so (but not cleanly -- the leader is unresponsive for quite 
> > > > > > awhile
> > > > > > until the new mon joins). debug_mon=20 on the leader doesn't show
> > > > > > anything very interesting.
> > > > > > On our large mimic cluster we tried waiting more than 10 minutes --
> > > > > > suffering through several mon elections and 100% usage bouncing 
> > > > > > around
> > > > > > between leaders -- until we gave up.
> > > > > >
> > > > > > I'm pulling my hair out a bit on this -- it's really weird!
> > > > >
> > > > > Can you try running a rocksdb compaction on the existing mons before
> > > > > adding the new one and see if that helps?
> > > >
> > > > It doesn't help. I compacted the 3 mons in quorum then started a new
> > > > one with debug mon & paxos = 20.
> > > >
> > > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83
> > > >
> > > > I stopped that new mon as soon as the sync source started spinning
> > > > 100% and left the quorum.
> > >
> > > Can you include the log from teh sync source too?  That's presumably where
> > > the bug is.
> >
> > Here's a different new mon and the leader, with debug_paxos & mon = 20:
> >
> > ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055
> >
> > Things start to go wrong at this line:
> >
> > 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader)
> > e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2
> >
> > ...which is the just before it tries to sync osd_snap.
> >
> > I also included the output of ceph-monstore-tool dump-keys. There are
> > really a lot of osd_snap keys!
>
> Aha, I knew this sounded familiar! See
> https://github.com/ceph/ceph/pull/31581
>
> We should backport this for the next nautilus...
>

Perfect.. thanks!!

-- dan



> sage
>
>
> >
> > Thanks!
> >
> > -- dan
> >
> > >
> > > Thanks!
> > > sage
> > >
> > > >
> > > > -- Dan
> > > >
> > > >
> > > > >
> > > > > s
> > > > >
> > > > > >
> > > > > > Did anyone add a new mon to an existing large cluster recently, and 
> > > > > > it
> > > > > > went smoothly?
> > > > > >
> > > > > > Cheers, Dan
> > > > > >
> > > > > > [1]
> > > > > >
> > > > > >   15.12%  ceph-mon [.]
> > > > > > MonitorDBStore::Transaction::encode
> > > > > >8.95%  libceph-common.so.0  [.]
> > > > > > ceph::buffer::v14_2_0::ptr::append
> > > > > >8.68%  libceph-common.so.0  [.]
> > > > > > ceph::buffer::v14_2_0::list::append
> > > > > >7.69%  libceph-common.so.0  [.]
> > > > > > ceph::buffer::v14_2_0::ptr::release
> > > > > >5.86%  libceph-common.so.0  [.]
> > > > > > ceph::buffer::v14_2_0::ptr::ptr
> > > > > > ___
> > > > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster  wrote:
>
> On Thu, Mar 5, 2020 at 8:05 PM Sage Weil  wrote:
> >
> > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil  wrote:
> > > >
> > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > Hi Sage,
> > > > >
> > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  wrote:
> > > > > >
> > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > > > Hi all,
> > > > > > >
> > > > > > > There's something broken in our env when we try to add new mons to
> > > > > > > existing clusters, confirmed on two clusters running mimic and
> > > > > > > nautilus. It's basically this issue
> > > > > > > https://tracker.ceph.com/issues/42830
> > > > > > >
> > > > > > > In case something is wrong with our puppet manifests, I'm trying 
> > > > > > > to
> > > > > > > doing it manually.
> > > > > > >
> > > > > > > First we --mkfs the mon and start it, but as soon as the new mon
> > > > > > > starts synchronizing, the existing leader becomes unresponsive 
> > > > > > > and an
> > > > > > > election is triggered.
> > > > > > >
> > > > > > > Here's exactly what I'm doing:
> > > > > > >
> > > > > > > # cd /var/lib/ceph/tmp/
> > > > > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 
> > > > > > > keyring.mon.cephmon4
> > > > > > > # ceph mon getmap -o monmap
> > > > > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > > > > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > > > > > > # vi /etc/ceph/ceph.conf 
> > > > > > > [mon.cephmon4]
> > > > > > > host = cephmon4
> > > > > > > mon addr = a.b.c.d:6790
> > > > > > > # systemctl start ceph-mon@cephmon4
> > > > > > >
> > > > > > > The log file on the new mon shows it start synchronizing, then
> > > > > > > immediately the CPU usage on the leader goes to 100% and elections
> > > > > > > start happening, and ceph health shows mon slow ops. perf top of 
> > > > > > > the
> > > > > > > ceph-mon with 100% CPU is shown below [1].
> > > > > > > On a small nautilus cluster, the new mon gets added withing a 
> > > > > > > minute
> > > > > > > or so (but not cleanly -- the leader is unresponsive for quite 
> > > > > > > awhile
> > > > > > > until the new mon joins). debug_mon=20 on the leader doesn't show
> > > > > > > anything very interesting.
> > > > > > > On our large mimic cluster we tried waiting more than 10 minutes 
> > > > > > > --
> > > > > > > suffering through several mon elections and 100% usage bouncing 
> > > > > > > around
> > > > > > > between leaders -- until we gave up.
> > > > > > >
> > > > > > > I'm pulling my hair out a bit on this -- it's really weird!
> > > > > >
> > > > > > Can you try running a rocksdb compaction on the existing mons before
> > > > > > adding the new one and see if that helps?
> > > > >
> > > > > It doesn't help. I compacted the 3 mons in quorum then started a new
> > > > > one with debug mon & paxos = 20.
> > > > >
> > > > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83
> > > > >
> > > > > I stopped that new mon as soon as the sync source started spinning
> > > > > 100% and left the quorum.
> > > >
> > > > Can you include the log from teh sync source too?  That's presumably 
> > > > where
> > > > the bug is.
> > >
> > > Here's a different new mon and the leader, with debug_paxos & mon = 20:
> > >
> > > ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055
> > >
> > > Things start to go wrong at this line:
> > >
> > > 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader)
> > > e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2
> > >
> > > ...which is the just before it tries to sync osd_snap.
> > >
> > > I also included the output of ceph-monstore-tool dump-keys. There are
> > > really a lot of osd_snap keys!
> >
> > Aha, I knew this sounded familiar! See
> > https://github.com/ceph/ceph/pull/31581
> >
> > We should backport this for the next nautilus...
> >
>
> Perfect.. thanks!!

Sage, do you think I can workaround by setting
mon_sync_max_payload_size ridiculously small, like 1024 or something
like that?

-- dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote:
> On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster  wrote:
> >
> > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil  wrote:
> > >
> > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil  wrote:
> > > > >
> > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > > Hi Sage,
> > > > > >
> > > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  wrote:
> > > > > > >
> > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > > > > Hi all,
> > > > > > > >
> > > > > > > > There's something broken in our env when we try to add new mons 
> > > > > > > > to
> > > > > > > > existing clusters, confirmed on two clusters running mimic and
> > > > > > > > nautilus. It's basically this issue
> > > > > > > > https://tracker.ceph.com/issues/42830
> > > > > > > >
> > > > > > > > In case something is wrong with our puppet manifests, I'm 
> > > > > > > > trying to
> > > > > > > > doing it manually.
> > > > > > > >
> > > > > > > > First we --mkfs the mon and start it, but as soon as the new mon
> > > > > > > > starts synchronizing, the existing leader becomes unresponsive 
> > > > > > > > and an
> > > > > > > > election is triggered.
> > > > > > > >
> > > > > > > > Here's exactly what I'm doing:
> > > > > > > >
> > > > > > > > # cd /var/lib/ceph/tmp/
> > > > > > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 
> > > > > > > > keyring.mon.cephmon4
> > > > > > > > # ceph mon getmap -o monmap
> > > > > > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > > > > > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > > > > > > > # vi /etc/ceph/ceph.conf  > > > > > > > this>
> > > > > > > > [mon.cephmon4]
> > > > > > > > host = cephmon4
> > > > > > > > mon addr = a.b.c.d:6790
> > > > > > > > # systemctl start ceph-mon@cephmon4
> > > > > > > >
> > > > > > > > The log file on the new mon shows it start synchronizing, then
> > > > > > > > immediately the CPU usage on the leader goes to 100% and 
> > > > > > > > elections
> > > > > > > > start happening, and ceph health shows mon slow ops. perf top 
> > > > > > > > of the
> > > > > > > > ceph-mon with 100% CPU is shown below [1].
> > > > > > > > On a small nautilus cluster, the new mon gets added withing a 
> > > > > > > > minute
> > > > > > > > or so (but not cleanly -- the leader is unresponsive for quite 
> > > > > > > > awhile
> > > > > > > > until the new mon joins). debug_mon=20 on the leader doesn't 
> > > > > > > > show
> > > > > > > > anything very interesting.
> > > > > > > > On our large mimic cluster we tried waiting more than 10 
> > > > > > > > minutes --
> > > > > > > > suffering through several mon elections and 100% usage bouncing 
> > > > > > > > around
> > > > > > > > between leaders -- until we gave up.
> > > > > > > >
> > > > > > > > I'm pulling my hair out a bit on this -- it's really weird!
> > > > > > >
> > > > > > > Can you try running a rocksdb compaction on the existing mons 
> > > > > > > before
> > > > > > > adding the new one and see if that helps?
> > > > > >
> > > > > > It doesn't help. I compacted the 3 mons in quorum then started a new
> > > > > > one with debug mon & paxos = 20.
> > > > > >
> > > > > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83
> > > > > >
> > > > > > I stopped that new mon as soon as the sync source started spinning
> > > > > > 100% and left the quorum.
> > > > >
> > > > > Can you include the log from teh sync source too?  That's presumably 
> > > > > where
> > > > > the bug is.
> > > >
> > > > Here's a different new mon and the leader, with debug_paxos & mon = 20:
> > > >
> > > > ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055
> > > >
> > > > Things start to go wrong at this line:
> > > >
> > > > 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader)
> > > > e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2
> > > >
> > > > ...which is the just before it tries to sync osd_snap.
> > > >
> > > > I also included the output of ceph-monstore-tool dump-keys. There are
> > > > really a lot of osd_snap keys!
> > >
> > > Aha, I knew this sounded familiar! See
> > > https://github.com/ceph/ceph/pull/31581
> > >
> > > We should backport this for the next nautilus...
> > >
> >
> > Perfect.. thanks!!
> 
> Sage, do you think I can workaround by setting
> mon_sync_max_payload_size ridiculously small, like 1024 or something
> like that?

Yeah... IIRC that is how the original user worked around the problem. I 
think they use 64 or 128 KB.

sage

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 8:19 PM Sage Weil  wrote:
>
> On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster  wrote:
> > >
> > > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil  wrote:
> > > >
> > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil  wrote:
> > > > > >
> > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > > > Hi Sage,
> > > > > > >
> > > > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil  
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote:
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > There's something broken in our env when we try to add new 
> > > > > > > > > mons to
> > > > > > > > > existing clusters, confirmed on two clusters running mimic and
> > > > > > > > > nautilus. It's basically this issue
> > > > > > > > > https://tracker.ceph.com/issues/42830
> > > > > > > > >
> > > > > > > > > In case something is wrong with our puppet manifests, I'm 
> > > > > > > > > trying to
> > > > > > > > > doing it manually.
> > > > > > > > >
> > > > > > > > > First we --mkfs the mon and start it, but as soon as the new 
> > > > > > > > > mon
> > > > > > > > > starts synchronizing, the existing leader becomes 
> > > > > > > > > unresponsive and an
> > > > > > > > > election is triggered.
> > > > > > > > >
> > > > > > > > > Here's exactly what I'm doing:
> > > > > > > > >
> > > > > > > > > # cd /var/lib/ceph/tmp/
> > > > > > > > > # scp cephmon1:/var/lib/ceph/tmp/keyring.mon.cephmon1 
> > > > > > > > > keyring.mon.cephmon4
> > > > > > > > > # ceph mon getmap -o monmap
> > > > > > > > > # ceph-mon --mkfs -i cephmon4 --monmap monmap --keyrin
> > > > > > > > > keyring.mon.cephmon4 --setuser ceph --setgroup ceph
> > > > > > > > > # vi /etc/ceph/ceph.conf  > > > > > > > > this>
> > > > > > > > > [mon.cephmon4]
> > > > > > > > > host = cephmon4
> > > > > > > > > mon addr = a.b.c.d:6790
> > > > > > > > > # systemctl start ceph-mon@cephmon4
> > > > > > > > >
> > > > > > > > > The log file on the new mon shows it start synchronizing, then
> > > > > > > > > immediately the CPU usage on the leader goes to 100% and 
> > > > > > > > > elections
> > > > > > > > > start happening, and ceph health shows mon slow ops. perf top 
> > > > > > > > > of the
> > > > > > > > > ceph-mon with 100% CPU is shown below [1].
> > > > > > > > > On a small nautilus cluster, the new mon gets added withing a 
> > > > > > > > > minute
> > > > > > > > > or so (but not cleanly -- the leader is unresponsive for 
> > > > > > > > > quite awhile
> > > > > > > > > until the new mon joins). debug_mon=20 on the leader doesn't 
> > > > > > > > > show
> > > > > > > > > anything very interesting.
> > > > > > > > > On our large mimic cluster we tried waiting more than 10 
> > > > > > > > > minutes --
> > > > > > > > > suffering through several mon elections and 100% usage 
> > > > > > > > > bouncing around
> > > > > > > > > between leaders -- until we gave up.
> > > > > > > > >
> > > > > > > > > I'm pulling my hair out a bit on this -- it's really weird!
> > > > > > > >
> > > > > > > > Can you try running a rocksdb compaction on the existing mons 
> > > > > > > > before
> > > > > > > > adding the new one and see if that helps?
> > > > > > >
> > > > > > > It doesn't help. I compacted the 3 mons in quorum then started a 
> > > > > > > new
> > > > > > > one with debug mon & paxos = 20.
> > > > > > >
> > > > > > > ceph-post-file: 9867d4ef-38cc-4ae7-9631-c6b86e83
> > > > > > >
> > > > > > > I stopped that new mon as soon as the sync source started spinning
> > > > > > > 100% and left the quorum.
> > > > > >
> > > > > > Can you include the log from teh sync source too?  That's 
> > > > > > presumably where
> > > > > > the bug is.
> > > > >
> > > > > Here's a different new mon and the leader, with debug_paxos & mon = 
> > > > > 20:
> > > > >
> > > > > ceph-post-file: 8db3d788-e266-4034-9d0c-4ee55eb1d055
> > > > >
> > > > > Things start to go wrong at this line:
> > > > >
> > > > > 2020-03-05 19:37:35.697 7f5fe87e2700 10 mon.p05517715y58557@0(leader)
> > > > > e32 handle_sync mon_sync(get_chunk cookie 170322296835) v2
> > > > >
> > > > > ...which is the just before it tries to sync osd_snap.
> > > > >
> > > > > I also included the output of ceph-monstore-tool dump-keys. There are
> > > > > really a lot of osd_snap keys!
> > > >
> > > > Aha, I knew this sounded familiar! See
> > > > https://github.com/ceph/ceph/pull/31581
> > > >
> > > > We should backport this for the next nautilus...
> > > >
> > >
> > > Perfect.. thanks!!
> >
> > Sage, do you think I can workaround by setting
> > mon_sync_max_payload_size ridiculously small, like 1024 or something
> > like that?
>
> Yeah... IIRC that is how the original user worked around the problem. I
> think they use 64 or 128 KB.

Nice... 64kB still triggered elections but 4kB worked. I have 5 mons again!

-- dan

>
> sage
>
>
_

[ceph-users] Ceph Performance of Micron 5210 SATA?

2020-03-05 Thread Hermann Himmelbauer
Hi,
Does someone know if the following harddisk has a decent performance in
a ceph cluster:

Micron 5210 ION 1.92TB, SATA (MTFDDAK1T9QDE-2AV1ZABYY)

The spec state, that the disk has power loss protection, however, I'd
nevertheless like to make sure that all goes well with this disk.

Best Regards,
Hermann

-- 
herm...@qwer.tk
PGP/GPG: 299893C7 (on keyservers)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Performance of Micron 5210 SATA?

2020-03-05 Thread Anthony D'Atri
That depends on how you define “decent” , and your use case.

Be careful that these are QLC drives.  QLC is pretty new and longevity would 
seem to vary quite a bit based on op mix.  These might be fine for read-mostly 
workloads, but high-turnover databases might burn them up fast, especially as 
they fill up.

> On Mar 5, 2020, at 12:38 PM, Hermann Himmelbauer  wrote:
> 
> Hi,
> Does someone know if the following harddisk has a decent performance in
> a ceph cluster:
> 
> Micron 5210 ION 1.92TB, SATA (MTFDDAK1T9QDE-2AV1ZABYY)
> 
> The spec state, that the disk has power loss protection, however, I'd
> nevertheless like to make sure that all goes well with this disk.
> 
> Best Regards,
> Hermann
> 
> -- 
> herm...@qwer.tk
> PGP/GPG: 299893C7 (on keyservers)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How can I fix "object unfound" error?

2020-03-05 Thread DHilsbos
Simone;

What is your failure domain?

If you don't know your failure domain can you provide the CRUSH ruleset for the 
pool that experienced the "object unfound" error?

Thank you,

Dominic L. Hilsbos, MBA 
Director - Information Technology 
Perform Air International Inc.
dhils...@performair.com 
www.PerformAir.com



-Original Message-
From: Simone Lazzaris [mailto:simone.lazza...@qcom.it] 
Sent: Thursday, March 05, 2020 6:11 AM
To: ceph-users; Chad William Seys
Subject: [ceph-users] Re: How can I fix "object unfound" error?

In data mercoledì 4 marzo 2020 18:14:31 CET, Chad William Seys ha scritto:
> > Maybe I've marked the object as "lost" and removed the failed
> > OSD.
> > 
> > The cluster now is healthy, but I'd like to understand if it's likely
> > to bother me again in the future.
> 
> Yeah, I don't know.
> 
> Within the last month there are 4 separate instances of people
> mentioning "unfound" object in their cluster.
> 
> I'm deferring as long as possible any OSD drive upgrades.  I ran into
> the problem when "draining" an OSD.
> 
> "draining" means remove OSD from crush map, wait for all PG to be stored
> elsewhere, then replace drive with larger one.  Under those
> circumstances there should be no PG unfound.
> 
> BTW, are you using cache tiering ?  The bug report mentions this, but
> some people did not have this enabled.
> 
> Chad.

No, I don't have cache tiering enabled. I also found strange that the PG was 
marked 
unfound: the cluster was perfectly healthy before the kernel panic and a single 
OSD failure 
shouldn't create mush hassle.


*Simone Lazzaris*
*Qcom S.p.A. a Socio Unico*
 

Via Roggia Vignola, 9 | 24047 Treviglio (BG)T +39 0363 1970352 | M +39 
3938111237

simone.lazza...@qcom.it[1] | www.qcom.it[2]
* LinkedIn[3]* | *Facebook*[4]
[5] 




[1] mailto:simone.lazza...@qcom.it
[2] https://www.qcom.it
[3] https://www.linkedin.com/company/qcom-spa
[4] http://www.facebook.com/qcomspa
[5] https://www.qcom.it/includes/NUOVAemail-banner.gif
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Anthony D'Atri


> 
>>> Sage, do you think I can workaround by setting
>>> mon_sync_max_payload_size ridiculously small, like 1024 or something
>>> like that?
>> 
>> Yeah... IIRC that is how the original user worked around the problem. I
>> think they use 64 or 128 KB.
> 
> Nice... 64kB still triggered elections but 4kB worked. I have 5 mons again!
> 

I had an experience on 12.2.2 (Luminous) that seems as though it may have been 
related.

* mon02, not the lead, crashed with a DIMM error and rebooted.  The cluster 
rode it out just fine
* The next day mon02 was taken down gracefully to address the DIMM issue.  The 
lead mon’s memory footprint spiked and an election storm commenced.
* IIRC recovery required bringing back the second mon and restarting ceph-mon 
on the lead




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Performance of Micron 5210 SATA?

2020-03-05 Thread mj

I have just ordered two of them to try. (the 3.47GB ION's)

If you want, next week I could perhaps run some commands on them..?

MJ

On 3/5/20 9:38 PM, Hermann Himmelbauer wrote:

Hi,
Does someone know if the following harddisk has a decent performance in
a ceph cluster:

Micron 5210 ION 1.92TB, SATA (MTFDDAK1T9QDE-2AV1ZABYY)

The spec state, that the disk has power loss protection, however, I'd
nevertheless like to make sure that all goes well with this disk.

Best Regards,
Hermann


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io