[ceph-users] recovery from catastrophic mon and mds failure after reboot and ip address change

2022-06-27 Thread Florian Jonas
Dear experts, we have a small computing cluster with 21 OSDs and 3 monitors and 3MDS running on ceph version 13.2.10 on ubuntu 18.04. A few  days ago we had an unexpected reboot of all machines, as well as a change of the IP address of one machine, which was hosting a MDS as well as a monitor.

[ceph-users] Re: Conversion to Cephadm

2022-06-27 Thread Redouane Kachach Elhichou
>From the error message: 2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too many arguments: [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true] it seems that you are not using the cephadm that corresponds to your ceph version. Please, try to get cephad

[ceph-users] Re: scrubbing+deep+repair PGs since Upgrade

2022-06-27 Thread Marcus Müller
Hi Stefan, thanks for the fast reply. I did some research and have the following output: ~ $ rados list-inconsistent-pg {pool-name1} [] ~ $ rados list-inconsistent-pg {pool-name2} [] ~ $ rados list-inconsistent-pg {pool-name3} [] — ~ $ rados list-inconsistent-obj 7.989 {"epoch":3006349,"inco

[ceph-users] recommended Linux distro for Ceph Pacific small cluster

2022-06-27 Thread Bobby
Hi, What is the recommended Linux distro for Ceph Pacific. I would like to set up a small cluster having around 4-5 OSDs, one monitor node and one client node. Earlier I have been using CentOS. Is it recommended to continue with CentOS? or should I go for another distro? Please do comment. Lookin

[ceph-users] Re: cephadm orch thinks hosts are offline

2022-06-27 Thread Thomas Roth
Hi Adam, no, this is the 'feature' where the reboot of a mgr hosts causes all known hosts to become unmanaged. > # lxbk0375 # ceph cephadm check-host lxbk0374 10.20.2.161 > mgr.server reply reply (1) Operation not permitted check-host failed: > Host 'lxbk0374' not found. Use 'ceph orch host ls

[ceph-users] Multiple subnet single cluster

2022-06-27 Thread Tahder Xunil
Hi, Just confused about my setup as got issues with using the ceph-ansible as getting an error in regards to the rados gateways. I supposedly will implement one rgw per subnet (and got 3 public subnets 192.168.50.x/24, 192.168.100.x/24 and 192.168.150.x/24 which has 2 servers each with 16 OSDs) bu

[ceph-users] Set device-class via service specification file

2022-06-27 Thread Robert Reihs
Hi, We are setting up a test cluster with cephadm. We would like to set different device classes for the osd's . Is there a possibility to set this via the service specification yaml file. This is the configuration for the osd service: --- service_type: osd service_id: osd_mon_disk_layout_fast

[ceph-users] Re: Set device-class via service specification file

2022-06-27 Thread David Orman
Hi Robert, We had the same question and ended up creating a PR for this: https://github.com/ceph/ceph/pull/46480 - there are backports, as well, so I'd expect it will be in the next release or two. David On Mon, Jun 27, 2022 at 8:07 AM Robert Reihs wrote: > Hi, > We are setting up a test clust

[ceph-users] runaway mon DB

2022-06-27 Thread Wyll Ingersoll
Running Ceph Pacific 16.2.7 We have a very large cluster with 3 monitors. One of the monitor DBs is > 2x the size of the other 2 and is growing constantly (store.db fills up) and eventually fills up the /var partition on that server. The monitor in question is not​ the leader. The cluster i

[ceph-users] Re: Conversion to Cephadm

2022-06-27 Thread Eugen Block
Hi, there are some defaults for container images when used with cephadm. If you didn't change anything you probably get docker.io... when running: ceph config dump | grep image globalbasic container_image docker.io/ceph/ceph@sha256... This is a

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
Hello, I had already increased/changed those variables previously. I increased the pg_num to 128. Which increased the number of PG's backfilling, but speed is still only at 30 MiB/s avg and has been backfilling 23 pg for the last several hours. Should I increase it higher than 128? I'm still tr

[ceph-users] Re: bunch of " received unsolicited reservation grant from osd" messages in log

2022-06-27 Thread Neha Ojha
This issue should be addressed by https://github.com/ceph/ceph/pull/46860. Thanks, Neha On Fri, Jun 24, 2022 at 2:53 AM Kenneth Waegeman wrote: > > Hi, > > I’ve updated the cluster to 17.2.0, but the log is still filled with these > entries: > > 2022-06-24T11:45:12.408944+02:00 osd031 ceph-osd[

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder wrote: > I think this is just how ceph is. Maybe you should post the output of > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an > idea whether what you look at is expected or not. As I wrote before, object > recovery is thro

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Robert Gallop
I saw a major boost after having the sleep_hdd set to 0. Only after that did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k obj/sec. Eventually it tapered back down, but for me sleep was the key, and specifically in my case: osd_recovery_sleep_hdd On Mon, Jun 27, 2022 a

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
I would love to see those types of speeds. I tried setting it all the way to 0 and nothing, I did that before I sent the first email, maybe it was your old post I got it from. osd_recovery_sleep_hdd 0.00 override (mon[0.00]) On

[ceph-users] calling ceph command from a crush_location_hook - fails to find sys.stdin.isatty()

2022-06-27 Thread Wyll Ingersoll
[ceph pacific 16.2.9] I have a crush_location_hook script which is a small python3 script that figures out the correct root/chassis/host location for a particular OSD. Our map has 2 roots, one for an all-SSD, and another for HDDs, thus the need for the location hook. Without it, the SSD devi

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
On Mon, Jun 27, 2022 at 11:08 PM Frank Schilder wrote: > Do you, by any chance have SMR drives? This may not be stated on the > drive, check what the internet has to say. I also would have liked to see > the beginning of the ceph status, number of hosts, number of OSDs, up in > down whatever. Can