[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Hi, The ceph.log from when you upgraded should give some clues. Are you using upmap balancing? Maybe this is just just further refinement of the balancing. -- dan On Thu, Mar 5, 2020 at 8:58 AM Rainer Krienke wrote: > > Hello, > > at the moment my ceph is still working but in a degraded state a

[ceph-users] Re: Identify slow ops

2020-03-05 Thread Thomas Schneider
Hi, I have stopped all 3 MON services sequentially. After starting the 3 MON services again, the slow ops where gone. However, just after 1 min. of MON service uptime, the slow ops are back again, and the blocked time is increasing constantly. root@ld3955:/home/ceph-scripts # ceph -w   cluster:  

[ceph-users] Re: consistency of import-diff

2020-03-05 Thread Janne Johansson
Den tors 5 mars 2020 kl 08:13 skrev Stefan Priebe - Profihost AG < s.pri...@profihost.ag>: > >> Hrm. We have checksums on the actual OSD data, so it ought to be > >> possible to add these to the export/import/diff bits so it can be > >> verified faster. > >> (Well, barring bugs.) > >> > > I mainly

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
I found some information in ceph.log that might help to find out what happened. node2 was the one I rebooted: 2020-03-05 07:24:29.844953 osd.45 (osd.45) 483 : cluster [DBG] 36.323 scrub starts 2020-03-05 07:24:33.552221 osd.45 (osd.45) 484 : cluster [DBG] 36.323 scrub ok 2020-03-05 07:24:38.94840

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Did you have `144 total, 144 up, 144 in` also before the upgrade? If an osd was out, then you upgraded/restarted and it went back in, it would trigger data movement. (I usually set noin before an upgrade). -- dan On Thu, Mar 5, 2020 at 9:46 AM Rainer Krienke wrote: > > I found some information i

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Janek Bevendorff
I also had some inadvertent recovery going on, although I think it started after I had restarted all MON, MGR, and MDS nodes and before I started restarting OSDs. On 05/03/2020 09:49, Dan van der Ster wrote: Did you have `144 total, 144 up, 144 in` also before the upgrade? If an osd was out,

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
Hello, before I ran the update to 14.2.8 I checked that the state was healthy with all OSDs up and in. I still have the command history I typed visible in my kde terminal buffer and there I see that after the update but before the reboot I ran a ceph -s and there were 144 osd's up and in the state

[ceph-users] PGs unknown after pool creation (Nautilus 14.2.4/6)

2020-03-05 Thread dg
Hello, I have a small ceph cluster running with 3 MON/MGR and 3 OSD hosts. There are also 3 virtual hosts in the crushmap to have a seperate SSD pool. Currently two pools are running, one of that exclusive to the SSD device class. My problem now is, that any new pool I try to create won't b

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Hi, There was movement already before you rebooted the node at 07:24:41.598004. That tells me that it was a ceph-mon process that restarted and either trimmed some upmaps or something similar. You can do this to see exactly what changed: # ceph osd getmap -o 31853 31853 # this is a guess -- pi

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
The difference was not a big one and consists in a change in pgp_num for a pool named pxa-ec froom 1024 to 999. All OSDs were up in the last map (31856) : # diff 31853.txt 31856.txt 1c1 < epoch 31853 --- > epoch 31856 4c4 < modified 2020-03-04 14:41:52.079327 --- > modified 2020-03-05 07:24:39.938

[ceph-users] Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
Hi all, There's something broken in our env when we try to add new mons to existing clusters, confirmed on two clusters running mimic and nautilus. It's basically this issue https://tracker.ceph.com/issues/42830 In case something is wrong with our puppet manifests, I'm trying to doing it manually

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Dan van der Ster
Ahh that's it! You have `autoscale_mode on` for the pool, and in 14.2.8 there was a fix to calculating how many PGs are needed in an erasure coded pool: https://github.com/ceph/ceph/commit/0253205ef36acc6759a3a9687c5eb1b27aa901bf So at the moment your PGs are merging. If you want to stop that ch

[ceph-users] Re: Unexpected recovering after nautilus 14.2.7 -> 14.2.8

2020-03-05 Thread Rainer Krienke
Ok this seems to makes sense. At the moment the cluster is still busy hnadling misplaced objects, but when its done, I will set autoscale to "warn" and also set the no...-Flags and then try to upgrade the next monitor and see if this works smoother. Thank you very much for yout help. I learned a

[ceph-users] Re: How can I fix "object unfound" error?

2020-03-05 Thread Simone Lazzaris
In data mercoledì 4 marzo 2020 18:14:31 CET, Chad William Seys ha scritto: > > Maybe I've marked the object as "lost" and removed the failed > > OSD. > > > > The cluster now is healthy, but I'd like to understand if it's likely > > to bother me again in the future. > > Yeah, I don't know. >

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > Hi all, > > There's something broken in our env when we try to add new mons to > existing clusters, confirmed on two clusters running mimic and > nautilus. It's basically this issue > https://tracker.ceph.com/issues/42830 > > In case something is wron

[ceph-users] ceph-mon store.db disk usage increase on OSD-Host fail

2020-03-05 Thread Hartwig Hauschild
Hi, I'm (still) testing upgrading from Luminous to Nautilus and ran into the following situation: The lab-setup I'm testing in has three OSD-Hosts. If one of those hosts dies the store.db in /var/lib/ceph/mon/ on all my Mon-Nodes starts to rapidly grow in size until either the OSD-host comes ba

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Wido den Hollander
On 3/5/20 3:22 PM, Sage Weil wrote: > On Thu, 5 Mar 2020, Dan van der Ster wrote: >> Hi all, >> >> There's something broken in our env when we try to add new mons to >> existing clusters, confirmed on two clusters running mimic and >> nautilus. It's basically this issue >> https://tracker.ceph.c

[ceph-users] Re: Fw: Incompatibilities (implicit_tenants & barbican) with Openstack after migrating from Ceph Luminous to Nautilus.

2020-03-05 Thread Casey Bodley
On 3/3/20 2:33 PM, Scheurer François wrote: /(resending to the new maillist)/ Dear Casey, Dear All, We tested the migration from Luminous to Nautilus and noticed two regressions breaking the RGW integration in Openstack: 1)  the following config parameter is not working on Nautilu

[ceph-users] Re: Fw: Incompatibilities (implicit_tenants & barbican) with Openstack after migrating from Ceph Luminous to Nautilus.

2020-03-05 Thread Scheurer François
Dear Casey Many thanks that's great to get your help! Cheers Francois From: Casey Bodley Sent: Thursday, March 5, 2020 3:57 PM To: Scheurer François; ceph-users@ceph.io Cc: Engelmann Florian; Rafael Weingärtner Subject: Re: Fw: Incompatibilities (impl

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
Hi Sage, On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote: > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > Hi all, > > > > There's something broken in our env when we try to add new mons to > > existing clusters, confirmed on two clusters running mimic and > > nautilus. It's basically this issu

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 3:31 PM Wido den Hollander wrote: > > > > On 3/5/20 3:22 PM, Sage Weil wrote: > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > >> Hi all, > >> > >> There's something broken in our env when we try to add new mons to > >> existing clusters, confirmed on two clusters running m

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > Hi Sage, > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote: > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > Hi all, > > > > > > There's something broken in our env when we try to add new mons to > > > existing clusters, confirmed on two clu

[ceph-users] Re: Error in Telemetry Module

2020-03-05 Thread Lenz Grimmer
On 2020-03-05 04:22, Anthony D'Atri wrote: >>> The message HEALTH_ERR, in red, on the front of the dashboard, is an >>> interesting way to start the day. ;) >> >> If possible, I'd suggest to change this into a HEALTH_WARN state - >> heaven is not falling down just because the telemetry module can'

[ceph-users] rbd-mirror - which direction?

2020-03-05 Thread Ml Ml
Hello, i am running luminous and i would like to back up my cluster from Site-A to Site-B (one way) So i decided to mirror it to an off-site ceph cluster. I read: https://docs.ceph.com/docs/luminous/rbd/rbd-mirroring/ But i liked I https://github.com/MiracleMa/Blog/issues/2 a little better. Bu

[ceph-users] Re: How can I fix "object unfound" error?

2020-03-05 Thread Chad William Seys
No, I don't have cache tiering enabled. I also found strange that the PG was marked unfound: the cluster was perfectly healthy before the kernel panic and a single OSD failure shouldn't create mush hassle. Yes, it is a bug unless using a singly replicated pool! C. __

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote: > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > Hi Sage, > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote: > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > Hi all, > > > > > > > > There's something broken in our env when we

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote: > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > Hi Sage, > > > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote: > > > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > > Hi all

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote: > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote: > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > Hi Sage, > > > > > > > > On Thu, Mar 5, 2020 at 3:22 PM Sage Weil wrote: > > > > > >

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote: > > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote: > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote: > > > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > > Hi Sage, > >

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Sage Weil
On Thu, 5 Mar 2020, Dan van der Ster wrote: > On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote: > > > > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote: > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > On Thu, Mar 5, 2020 at 4:42 PM Sage Weil wrote: > > > > > > > > > > On Thu,

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Dan van der Ster
On Thu, Mar 5, 2020 at 8:19 PM Sage Weil wrote: > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > On Thu, Mar 5, 2020 at 8:07 PM Dan van der Ster wrote: > > > > > > On Thu, Mar 5, 2020 at 8:05 PM Sage Weil wrote: > > > > > > > > On Thu, 5 Mar 2020, Dan van der Ster wrote: > > > > > On Thu, Mar

[ceph-users] Ceph Performance of Micron 5210 SATA?

2020-03-05 Thread Hermann Himmelbauer
Hi, Does someone know if the following harddisk has a decent performance in a ceph cluster: Micron 5210 ION 1.92TB, SATA (MTFDDAK1T9QDE-2AV1ZABYY) The spec state, that the disk has power loss protection, however, I'd nevertheless like to make sure that all goes well with this disk. Best Regards,

[ceph-users] Re: Ceph Performance of Micron 5210 SATA?

2020-03-05 Thread Anthony D'Atri
That depends on how you define “decent” , and your use case. Be careful that these are QLC drives. QLC is pretty new and longevity would seem to vary quite a bit based on op mix. These might be fine for read-mostly workloads, but high-turnover databases might burn them up fast, especially as

[ceph-users] Re: How can I fix "object unfound" error?

2020-03-05 Thread DHilsbos
Simone; What is your failure domain? If you don't know your failure domain can you provide the CRUSH ruleset for the pool that experienced the "object unfound" error? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International Inc. dhils...@performair.com

[ceph-users] Re: Can't add a ceph-mon to existing large cluster

2020-03-05 Thread Anthony D'Atri
> >>> Sage, do you think I can workaround by setting >>> mon_sync_max_payload_size ridiculously small, like 1024 or something >>> like that? >> >> Yeah... IIRC that is how the original user worked around the problem. I >> think they use 64 or 128 KB. > > Nice... 64kB still triggered elections

[ceph-users] Re: Ceph Performance of Micron 5210 SATA?

2020-03-05 Thread mj
I have just ordered two of them to try. (the 3.47GB ION's) If you want, next week I could perhaps run some commands on them..? MJ On 3/5/20 9:38 PM, Hermann Himmelbauer wrote: Hi, Does someone know if the following harddisk has a decent performance in a ceph cluster: Micron 5210 ION 1.92TB, S