[ceph-users] Re: ceph status reports: slow ops - this is related to long running process /usr/bin/ceph-osd

2020-02-18 Thread Wido den Hollander
On 10/8/19 3:53 PM, Thomas wrote: > Hi, > ceph status reports: > root@ld3955:~# ceph -s >   cluster: >     id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae >     health: HEALTH_ERR >     1 filesystem is degraded >     1 filesystem has a failed mds daemon >     1 filesystem is

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-18 Thread Wido den Hollander
On 8/27/19 11:49 PM, Bryan Stillwell wrote: > We've run into a problem on our test cluster this afternoon which is running > Nautilus (14.2.2). It seems that any time PGs move on the cluster (from > marking an OSD down, setting the primary-affinity to 0, or by using the > balancer), a large

[ceph-users] Re: Excessive write load on mons after upgrade from 12.2.13 -> 14.2.7

2020-02-18 Thread Peter Woodman
Yeah, applied that command. For some reason, after 3 days of this, the behavior calmed down, and the size of the mon store shrank down to ~100MB, where previously it was growing to upwards of 6GB. On Mon, Feb 17, 2020 at 3:14 AM Dan van der Ster wrote: > This means it has been applied: > > # ce

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-18 Thread Paul Emmerich
I've also seen this problem on Nautilus with no obvious reason for the slowness once. In my case it was a rather old cluster that was upgraded all the way from firefly -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 Mü

[ceph-users] Pool on limited number of OSDs

2020-02-18 Thread Jacek Suchenia
Hello I have a cluster, (Nautilus 14.2.4) where one pool I'd like to keep on a dedicated OSDs. So I setup a rule that covers *3* dedicated OSDs (using device classes) and assigned it to pool with replication factor *3*. Only 10% PGs were assigned and rebalanced, where rest of them stuck in *unders

[ceph-users] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-18 Thread Sean Matheny
Hi folks, Our entire cluster is down at the moment. We started upgrading from 12.2.13 to 14.2.7 with the monitors. The first monitor we upgraded crashed. We reverted to luminous on this one and tried another, and it was fine. We upgraded the rest, and they all worked. Then we upgraded the firs

[ceph-users] Re: EC Pools w/ RBD - IOPs

2020-02-18 Thread Anthony Brandelli (abrandel)
Added a fifth OSD node. Cluster now looks something like: 3x mons (2x 10G, 2x E5-2690 V2, 256GB RAM) 5x OSD (2x 10G, 2x e5-2690 V2, 256GB-385GB RAM, 12x Samsung SM1625 SSDs) Random write latency went up to 16ms average with the addition of the fifth node, and k=3,m=2. What kind of latencies are

[ceph-users] Re: [FORGED] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-18 Thread Sean Matheny
I wanted to add a specific question to the previous post, in the hopes it’s easier to answer. We have a Luminous monitor restored from the OSDs using ceph-object-tool, which seems like the best chance of any success. We followed this rough process: https://tracker.ceph.com/issues/24419 The mon

[ceph-users] Re: [FORGED] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-18 Thread Wido den Hollander
On 2/19/20 5:45 AM, Sean Matheny wrote: > I wanted to add a specific question to the previous post, in the hopes it’s > easier to answer. > > We have a Luminous monitor restored from the OSDs using ceph-object-tool, > which seems like the best chance of any success. We followed this rough > p

[ceph-users] Re: osd_pg_create causing slow requests in Nautilus

2020-02-18 Thread Wido den Hollander
On 2/18/20 6:54 PM, Paul Emmerich wrote: > I've also seen this problem on Nautilus with no obvious reason for the > slowness once. Did this resolve itself? Or did you remove the pool? > In my case it was a rather old cluster that was upgraded all the way > from firefly > > This cluster has

[ceph-users] Re: Pool on limited number of OSDs

2020-02-18 Thread Wido den Hollander
On 2/18/20 6:56 PM, Jacek Suchenia wrote: > Hello > > I have a cluster, (Nautilus 14.2.4) where one pool I'd like to keep on a > dedicated OSDs. So I setup a rule that covers *3* dedicated OSDs (using > device classes) and assigned it to pool with replication factor *3*. Only > 10% PGs were ass

[ceph-users] Re: Performance of old vs new hw?

2020-02-18 Thread Martin Verges
Depends on your current SSDs and the new SSDs. It is highly likely that most performance increase will come from choosing good new NVMe. In addition higher clock frequency will increase IO as well but only if it is a bottleneck. -- Martin Verges Managing director Hint: Secure one of the last slot

[ceph-users] Re: Performance of old vs new hw?

2020-02-18 Thread Виталий Филиппов
Hello Jesper, Assuming your SSDs are server ones with capacitors I'd say the biggest impact will come from CPUs, up to twice lesser latency, up to twice... or maybe 1.5x more iops in parallel mode. If your SSDs are desktop then new server NVMes will be a significant improvement for writes, too

[ceph-users] Re: [FORGED] Lost all Monitors in Nautilus Upgrade, best way forward?

2020-02-18 Thread Sean Matheny
Thanks, If the OSDs have a newer epoch of the OSDMap than the MON it won't work. How can I verify this? (i.e the epoch of the monitor vs the epoch of the osd(s)) Cheers, Sean On 19/02/2020, at 7:25 PM, Wido den Hollander mailto:w...@42on.com>> wrote: On 2/19/20 5:45 AM, Sean Matheny wrote: