[ceph-users] recovery from catastrophic mon and mds failure after reboot and ip address change
Dear experts, we have a small computing cluster with 21 OSDs and 3 monitors and 3MDS running on ceph version 13.2.10 on ubuntu 18.04. A few days ago we had an unexpected reboot of all machines, as well as a change of the IP address of one machine, which was hosting a MDS as well as a monitor. I am not exactly sure what played out during that night, but we lost quorum of all three monitors and no filesystem was visible anymore, so we are starting to get quite worried about data loss. We tried destroying and recreating the monitor of which the ip address changed, but it did not help (which however might have been a mistake). Long story short, we tried to recover restoring by adapting the changed ip address in the config and tried to recover the monitors using the information from the OSDs, following the procedure outline here: https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds We are now in a situation where ceph status shows the following: cluster: id: 61fd9a61-89d6-4383-a2e6-ec4f4a13830f health: HEALTH_WARN 43 slow ops, oldest one blocked for 57132 sec, daemons [mon.dip01,mon.pc078,mon.pc147] have slow ops. services: mon: 3 daemons, quorum pc147,pc078,dip01 mgr: dip01(active) osd: 22 osds: 0 up, 0 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 0 B used, 0 B / 0 B avail pgs: The monitors show a quorum (i think that's a good start), but we do not see any of the pools that were previously there and also no filesystem is visible. Running the command "ceph fs status" shows all MDS are in standby and no filesystem is activated. I looked into the HEALTH_WARNING, by checking the journalctl -xe on the monitor machines and one finds errors of the type: Jun 24 09:10:30 dip01 ceph-mon[69148]: 2022-06-24 09:10:30.978 7f0173e02700 -1 mon.dip01@2(peon) e15 get_health_metrics reporting 4 slow ops, oldest is osd_boot(osd.12 booted 0 features 4611087854031667195 v13031) In order to check what is going on with the osd_boot error, i checked the logs on the osd machines and found warning such as: 2022-06-24 09:16:42.383 7fdc165d5c00 0 /build/ceph-13.2.10/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs 2022-06-24 09:16:42.383 7fdc165d5c00 0 _get_class not permitted to load kvs 2022-06-24 09:16:42.383 7fdc165d5c00 0 /build/ceph-13.2.10/src/cls/hello/cls_hello.cc:296: loading cls_hello 2022-06-24 09:16:42.383 7fdc165d5c00 0 _get_class not permitted to load lua 2022-06-24 09:16:42.387 7fdc165d5c00 0 _get_class not permitted to load sdk 2022-06-24 09:16:42.387 7fdc165d5c00 1 osd.6 13035 warning: got an error loading one or more classes: (1) Operation not permitted 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has features 288514051259236352, adjusting msgr requires for clients 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has features 288514051259236352 was 8705, adjusting msgr requires for mons 2022-06-24 09:16:42.387 7fdc165d5c00 0 osd.6 13035 crush map has features 1009089991638532096, adjusting msgr requires for osds 2022-06-24 09:16:42.387 7fdc165d5c00 1 osd.6 13035 check_osdmap_features require_osd_release 0 -> 2022-06-24 09:16:44.527 7fdc165d5c00 0 osd.6 13035 load_pgs 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 load_pgs opened 67 pgs 2022-06-24 09:16:50.375 7fdc165d5c00 0 osd.6 13035 using weightedpriority op queue with priority op cut off at 64. 2022-06-24 09:16:50.375 7fdc165d5c00 -1 osd.6 13035 log_to_monitors {default=true} 2022-06-24 09:16:50.383 7fdc165d5c00 0 osd.6 13035 done with init, starting boot process 2022-06-24 09:16:50.383 7fdc165d5c00 1 osd.6 13035 start_boot 2022-06-24 09:16:50.495 7fdbec933700 1 osd.6 pg_epoch: 13035 pg[5.1( v 2785'2 (0'0,2785'2] local-lis/les=12997/12999 n=1 ec=2782/2782 lis/c 12997/12997 les/c/f 12999/12999/0 12997/12997/12954) [6,17,14] r=0 lpr=13021 crt=2785'2 lcod 0'0 mlcod 0'0 unknown mbc={}] state: transitioning to Primary The 21 OSDs themselves show as "exists,new" in ceph osd status, even though they remained untouched during the whole incident (which I hope means they still contain all our data somewhere) We only started operating our distributed filesystem about one year ago, and I must admit with this problem we are a bit out of our depth, so we would very much would appreciate any leads/help we can get on getting our filesystem up and running again. Alternatively, if all else fails, we would also appreciate any information about the possibility of recovering the data from the 21 OSDs, which amounts to over 60TB. Attached you find our ceph.conf file, as well as the logs from one example monitor and one osd node. If you need any other information let us know. Thank you in advance for you help, I know your time is valuable! Best regards, Florian Jonas p.s. to the moderators: This message is a resubmit with smaller log
[ceph-users] Re: Conversion to Cephadm
>From the error message: 2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too many arguments: [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true] it seems that you are not using the cephadm that corresponds to your ceph version. Please, try to get cephadm for octopus. -Redo On Sun, Jun 26, 2022 at 4:07 AM Brent Kennedy wrote: > I successfully converted to cephadm after upgrading the cluster to octopus. > I am on CentOS 7 and am attempting to convert some of the nodes over to > rocky, but when I try to add a rocky node in and start the mgr or mon > service, it tries to start an octopus container and the service comes back > with an error. Is there a way to force it to start a quincy container on > the new host? > > > > I tried to start an upgrade, which did deploy the manager nodes to the new > hosts, but it failed converting the monitors and now one is dead ( a centos > 7 one ). It seems it can spin up quincy containers on the new nodes, but > because it failed upgrading, it still trying to deploy the octopus ones to > the new node. > > > > Cephadm log on new node: > > > > 2022-06-25 21:51:34,427 7f4748727b80 DEBUG stat: Copying blob > sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621 > > 2022-06-25 21:51:34,647 7f4748727b80 DEBUG stat: Copying blob > sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621 > > 2022-06-25 21:51:34,652 7f4748727b80 DEBUG stat: Copying blob > sha256:731c3beff4deece7d4e54bc26ecf6d99988b19ea8414524277d83bc5a5d6eb70 > > 2022-06-25 21:51:59,006 7f4748727b80 DEBUG stat: Copying config > sha256:2cf504fded3980c76b59a354fca8f301941f86e369215a08752874d1ddb69b73 > > 2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Writing manifest to image > destination > > 2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Storing signatures > > 2022-06-25 21:51:59,239 7f4748727b80 DEBUG stat: 167 167 > > 2022-06-25 21:51:59,703 7f4748727b80 DEBUG /usr/bin/ceph-mon: too many > arguments: > [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true] > > 2022-06-25 21:51:59,797 7f4748727b80 INFO Non-zero exit code 1 from > /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host > --entrypoint /usr/bin/ceph-mon --init -e > CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=tpixmon5 -e > CEPH_USE_RANDOM_NONCE=1 -v > /var/log/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861:/var/log/ceph:z -v > > /var/lib/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861/mon.tpixmon5:/var/lib/cep > h/mon/ceph-tpixmon5:z -v /tmp/ceph-tmp7xmra8lk:/tmp/keyring:z -v > /tmp/ceph-tmp7mid2k57:/tmp/config:z docker.io/ceph/ceph:v15 --mkfs -i > tpixmon5 --fsid 33ca8009-79d6-45cf-a67e-9753ab4dc861 -c /tmp/config > --keyring /tmp/keyring --setuser ceph --setgroup ceph > --default-log-to-file=false --default-log-to-journald=true > --default-log-to-stderr=false --default-mon-cluster-log-to-file=false > --default-mon-cluster-log-to-journald=true > --default-mon-cluster-log-to-stderr=false > > 2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too > many > arguments: > [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true] > > > > Podman Images: > > REPOSITORY TAG IMAGE ID CREATEDSIZE > > quay.io/ceph/ceph e1d6a67b021e 2 weeks ago1.32 GB > > docker.io/ceph/ceph v15 2cf504fded39 13 months ago 1.05 GB > > > > I don't even know what that top one is because its not tagged and it > keeping > pulling it. Why would it be pulling a docker.io image ( only place to get > octopus images? )? > > > > I also tried to force upgrade the older failed monitor but the cephadm tool > says that the OS is too old. Its just odd to me that we would say go to > containers cause the OS wont matter and then it actually still matters > cause > podman versions tied to newer images. > > > > -Brent > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: scrubbing+deep+repair PGs since Upgrade
Hi Stefan, thanks for the fast reply. I did some research and have the following output: ~ $ rados list-inconsistent-pg {pool-name1} [] ~ $ rados list-inconsistent-pg {pool-name2} [] ~ $ rados list-inconsistent-pg {pool-name3} [] — ~ $ rados list-inconsistent-obj 7.989 {"epoch":3006349,"inconsistents":[]} ~ $ rados list-inconsistent-obj 7.28f {"epoch":3006337,"inconsistents":[]} ~ $ rados list-inconsistent-obj 7.603 {"epoch":3006329,"inconsistents":[]} ~ $ ceph config dump |grep osd_scrub_auto_repair Is empty $ ceph daemon mon.ceph4 config get osd_scrub_auto_repair { "osd_scrub_auto_repair": "true" } What does this tell me know? Setting can be changed to false of course, but as list-inconsistent-obj shows something, I would like to find the reason for that first. Regards Marcus > Am 27.06.2022 um 08:56 schrieb Stefan Kooman : > > On 6/27/22 08:48, Marcus Müller wrote: >> Hi all, >> we recently upgraded from Ceph Luminous (12.x) to Ceph Octopus (15.x) (of >> course with Mimic and Nautilus in between). Since this upgrade we see we >> constant number of active+clean+scrubbing+deep+repair PGs. We never had this >> in the past, now every time (like 10 or 20 PGs at the same time with the >> +repair flag). >> Does anyone know how to debug this more in detail ? > > ceph daemon mon.$mon-id config get osd_scrub_auto_repair > > ^^ This is disabled by default (Octopus 15.2.16), but you might have this > setting changed to true? > > ceph config dump |grep osd_scrub_auto_repair to check if it's a global > setting. > > Do the following commands return any info? > > rados list-inconsistent-pg > rados list-inconsistent-obj > > Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] recommended Linux distro for Ceph Pacific small cluster
Hi, What is the recommended Linux distro for Ceph Pacific. I would like to set up a small cluster having around 4-5 OSDs, one monitor node and one client node. Earlier I have been using CentOS. Is it recommended to continue with CentOS? or should I go for another distro? Please do comment. Looking forward to the reply. Thanks ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephadm orch thinks hosts are offline
Hi Adam, no, this is the 'feature' where the reboot of a mgr hosts causes all known hosts to become unmanaged. > # lxbk0375 # ceph cephadm check-host lxbk0374 10.20.2.161 > mgr.server reply reply (1) Operation not permitted check-host failed: > Host 'lxbk0374' not found. Use 'ceph orch host ls' to see all managed hosts. In some email on this issue I can't find atm, someone describes a workaround that allows to restart the entire orchestrator business. But that sounded risky. Regards Thomsa On 23/06/2022 19.42, Adam King wrote: Hi Thomas, What happens if you run "ceph cephadm check-host " for one of the hosts that is offline (and if that fails "ceph cephadm check-host ")? Usually, the hosts get marked offline when some ssh connection to them fails. The check-host command will attempt a connection and maybe let us see why it's failing, or, if there is no longer an issue connecting to the host, should mark the host online again. Thanks, - Adam King On Thu, Jun 23, 2022 at 12:30 PM Thomas Roth wrote: Hi all, found this bug https://tracker.ceph.com/issues/51629 (Octopus 15.2.13), reproduced it in Pacific and now again in Quincy: - new cluster - 3 mgr nodes - reboot active mgr node - (only in Quincy:) standby mgr node takes over, rebooted node becomse standby - `ceph orch host ls` shows all hosts as `offline` - add a new host: not offline In my setup, hostnames and IPs are well known, thus # ceph orch host ls HOST ADDR LABELS STATUS lxbk0374 10.20.2.161 _admin Offline lxbk0375 10.20.2.162 Offline lxbk0376 10.20.2.163 Offline lxbk0377 10.20.2.164 Offline lxbk0378 10.20.2.165 Offline lxfs416 10.20.2.178 Offline lxfs417 10.20.2.179 Offline lxfs418 10.20.2.222 Offline lxmds22 10.20.6.67 lxmds23 10.20.6.72 Offline lxmds24 10.20.6.74 Offline (All lxbk are mon nodes, the first 3 are mgr, 'lxmds22' was added after the fatal reboot.) Does this matter at all? The old bug report is one year old, now with prio 'Low'. And some people must have rebooted the one or other host in their clusters... There is a cephfs on our cluster, operations seem to be unaffected. Cheers Thomas -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Thomas Roth Department: Informationstechnologie Location: SB3 2.291 Phone: +49-6159-71 1453 Fax: +49-6159-71 2986 GSI Helmholtzzentrum für Schwerionenforschung GmbH Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528 Managing Directors / Geschäftsführung: Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats: State Secretary / Staatssekretär Dr. Volkmar Dietz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Multiple subnet single cluster
Hi, Just confused about my setup as got issues with using the ceph-ansible as getting an error in regards to the rados gateways. I supposedly will implement one rgw per subnet (and got 3 public subnets 192.168.50.x/24, 192.168.100.x/24 and 192.168.150.x/24 which has 2 servers each with 16 OSDs) but the cluster network is common to the 3 subnets. It seems working well the ceph cluster except stuck on the rados implementation. Any insights what where to start with? Or the configurations to alter? Or perhaps a new tool like cephadm. Regards, Mario ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Set device-class via service specification file
Hi, We are setting up a test cluster with cephadm. We would like to set different device classes for the osd's . Is there a possibility to set this via the service specification yaml file. This is the configuration for the osd service: --- service_type: osd service_id: osd_mon_disk_layout_fast placement: hosts: - fsn1-ceph-01 - fsn1-ceph-02 - fsn1-ceph-03 spec: data_devices: paths: - /dev/vdb encrypted: true journal_devices: paths: - /dev/vdc db_devices: paths: - /dev/vdc We would use this than in the crush rule. Or is there another way to set this up? Thanks Best Robert ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Set device-class via service specification file
Hi Robert, We had the same question and ended up creating a PR for this: https://github.com/ceph/ceph/pull/46480 - there are backports, as well, so I'd expect it will be in the next release or two. David On Mon, Jun 27, 2022 at 8:07 AM Robert Reihs wrote: > Hi, > We are setting up a test cluster with cephadm. We would like to > set different device classes for the osd's . Is there a possibility to set > this via the service specification yaml file. This is the configuration for > the osd service: > > --- > service_type: osd > service_id: osd_mon_disk_layout_fast > placement: > hosts: > - fsn1-ceph-01 > - fsn1-ceph-02 > - fsn1-ceph-03 > spec: > data_devices: > paths: > - /dev/vdb > encrypted: true > journal_devices: > paths: > - /dev/vdc > db_devices: > paths: > - /dev/vdc > > We would use this than in the crush rule. Or is there another way to set > this up? > Thanks > Best > Robert > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] runaway mon DB
Running Ceph Pacific 16.2.7 We have a very large cluster with 3 monitors. One of the monitor DBs is > 2x the size of the other 2 and is growing constantly (store.db fills up) and eventually fills up the /var partition on that server. The monitor in question is not the leader. The cluster itself is quite full but currently we cannot remove any data due to it's current mission requirements, so it is constantly in a state of rebalance and bumping up against the "toofull" limits. How can we keep the monitor DB from growing so fast? Why is it only on a secondary monitor not the primary? Can we force a monitor to compact it's DB while the system is actively repairing ? Thanks, Wyllys Ingersoll ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Conversion to Cephadm
Hi, there are some defaults for container images when used with cephadm. If you didn't change anything you probably get docker.io... when running: ceph config dump | grep image globalbasic container_image docker.io/ceph/ceph@sha256... This is a pacific one-node test cluster. If you want to set it to quay.io you can change it like this: # ceph config set global container_image quay.io/.../ceph-something I successfully converted to cephadm after upgrading the cluster to octopus. I am on CentOS 7 and am attempting to convert some of the nodes over to rocky, but when I try to add a rocky node in and start the mgr or mon service, it tries to start an octopus container and the service comes back with an error. Is there a way to force it to start a quincy container on the new host? Just to be clear, you upgraded to Octopus successfully, then tried to add new nodes with a newer OS and it tries to start an Octopus container, but that's expected, isn't it? Can you share more details which errors occur when you try to start Octopus containers? Zitat von Brent Kennedy : I successfully converted to cephadm after upgrading the cluster to octopus. I am on CentOS 7 and am attempting to convert some of the nodes over to rocky, but when I try to add a rocky node in and start the mgr or mon service, it tries to start an octopus container and the service comes back with an error. Is there a way to force it to start a quincy container on the new host? I tried to start an upgrade, which did deploy the manager nodes to the new hosts, but it failed converting the monitors and now one is dead ( a centos 7 one ). It seems it can spin up quincy containers on the new nodes, but because it failed upgrading, it still trying to deploy the octopus ones to the new node. Cephadm log on new node: 2022-06-25 21:51:34,427 7f4748727b80 DEBUG stat: Copying blob sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621 2022-06-25 21:51:34,647 7f4748727b80 DEBUG stat: Copying blob sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621 2022-06-25 21:51:34,652 7f4748727b80 DEBUG stat: Copying blob sha256:731c3beff4deece7d4e54bc26ecf6d99988b19ea8414524277d83bc5a5d6eb70 2022-06-25 21:51:59,006 7f4748727b80 DEBUG stat: Copying config sha256:2cf504fded3980c76b59a354fca8f301941f86e369215a08752874d1ddb69b73 2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Writing manifest to image destination 2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Storing signatures 2022-06-25 21:51:59,239 7f4748727b80 DEBUG stat: 167 167 2022-06-25 21:51:59,703 7f4748727b80 DEBUG /usr/bin/ceph-mon: too many arguments: [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true] 2022-06-25 21:51:59,797 7f4748727b80 INFO Non-zero exit code 1 from /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph-mon --init -e CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=tpixmon5 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861:/var/log/ceph:z -v /var/lib/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861/mon.tpixmon5:/var/lib/cep h/mon/ceph-tpixmon5:z -v /tmp/ceph-tmp7xmra8lk:/tmp/keyring:z -v /tmp/ceph-tmp7mid2k57:/tmp/config:z docker.io/ceph/ceph:v15 --mkfs -i tpixmon5 --fsid 33ca8009-79d6-45cf-a67e-9753ab4dc861 -c /tmp/config --keyring /tmp/keyring --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false --default-mon-cluster-log-to-file=false --default-mon-cluster-log-to-journald=true --default-mon-cluster-log-to-stderr=false 2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too many arguments: [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true] Podman Images: REPOSITORY TAG IMAGE ID CREATEDSIZE quay.io/ceph/ceph e1d6a67b021e 2 weeks ago1.32 GB docker.io/ceph/ceph v15 2cf504fded39 13 months ago 1.05 GB I don't even know what that top one is because its not tagged and it keeping pulling it. Why would it be pulling a docker.io image ( only place to get octopus images? )? I also tried to force upgrade the older failed monitor but the cephadm tool says that the OS is too old. Its just odd to me that we would say go to containers cause the OS wont matter and then it actually still matters cause podman versions tied to newer images. -Brent ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph recovery network speed
Hello, I had already increased/changed those variables previously. I increased the pg_num to 128. Which increased the number of PG's backfilling, but speed is still only at 30 MiB/s avg and has been backfilling 23 pg for the last several hours. Should I increase it higher than 128? I'm still trying to figure out if this is just how ceph is or if there is a bottleneck somewhere. Like if I sftp a 10G file between servers it's done in a couple min or less. Am I thinking of this wrong? Thanks, Curt On Mon, Jun 27, 2022 at 12:33 PM Frank Schilder wrote: > Hi Curt, > > as far as I understood, a 2+2 EC pool is recovering, which makes 1 OSD per > host busy. My experience is, that the algorithm for selecting PGs to > backfill/recover is not very smart. It could simply be that it doesn't find > more PGs without violating some of these settings: > > osd_max_backfills > osd_recovery_max_active > > I have never observed the second parameter to change anything (try any > ways). However, the first one has a large impact. You could try increasing > this slowly until recovery moves faster. Another parameter you might want > to try is > > osd_recovery_sleep_[hdd|ssd] > > Be careful as this will impact client IO. I could reduce the sleep for my > HDDs to 0.05. With your workload pattern, this might be something you can > tune as well. > > Having said that, I think you should increase your PG count on the EC pool > as soon as the cluster is healthy. You have only about 20 PGs per OSD and > large PGs will take unnecessarily long to recover. A higher PG count will > also make it easier for the scheduler to find PGs for recovery/backfill. > Aim for a number between 100 and 200. Give the pool(s) with most data > (#objects) the most PGs. > > Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Curt > Sent: 24 June 2022 19:04 > To: Anthony D'Atri; ceph-users@ceph.io > Subject: [ceph-users] Re: Ceph recovery network speed > > 2 PG's shouldn't take hours to backfill in my opinion. Just 2TB enterprise > HD's. > > Take this log entry below, 72 minutes and still backfilling undersized? > Should it be that slow? > > pg 12.15 is stuck undersized for 72m, current state > active+undersized+degraded+remapped+backfilling, last acting > [34,10,29,NONE] > > Thanks, > Curt > > > On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri > wrote: > > > Your recovery is slow *because* there are only 2 PGs backfilling. > > > > What kind of OSD media are you using? > > > > > On Jun 24, 2022, at 09:46, Curt wrote: > > > > > > Hello, > > > > > > I'm trying to understand why my recovery is so slow with only 2 pg > > > backfilling. I'm only getting speeds of 3-4/MiB/s on a 10G network. I > > > have tested the speed between machines with a few tools and all confirm > > 10G > > > speed. I've tried changing various settings of priority and recovery > > sleep > > > hdd, but still the same. Is this a configuration issue or something > else? > > > > > > It's just a small cluster right now with 4 hosts, 11 osd's per. Please > > let > > > me know if you need more information. > > > > > > Thanks, > > > Curt > > > ___ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: bunch of " received unsolicited reservation grant from osd" messages in log
This issue should be addressed by https://github.com/ceph/ceph/pull/46860. Thanks, Neha On Fri, Jun 24, 2022 at 2:53 AM Kenneth Waegeman wrote: > > Hi, > > I’ve updated the cluster to 17.2.0, but the log is still filled with these > entries: > > 2022-06-24T11:45:12.408944+02:00 osd031 ceph-osd[22024]: osd.508 pg_epoch: > 685367 pg[5.166s0( v 685201'4130317 (680710'4123328,685201'4130317] > local-lis/les=683269/683270 n=375262 ec=1104/1104 lis/c=683269/683269 > les/c/f=683270/683270/19430 sis=683269) > [508,181,357,22,592,228,250,383,28,213,586]p508(0) r=0 lpr=683269 > crt=685201'4130317 lcod 685201'4130316 mlcod 685201'4130316 active+clean > TIME_FOR_DEEP > ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]] > scrubber : handle_scrub_reserve_grant: received unsolicited > reservation grant from osd 357(2) (0x55e16d92f600) > 2022-06-24T11:45:12.412196+02:00 osd031 ceph-osd[22024]: osd.508 pg_epoch: > 685367 pg[5.166s0( v 685201'4130317 (680710'4123328,685201'4130317] > local-lis/les=683269/683270 n=375262 ec=1104/1104 lis/c=683269/683269 > les/c/f=683270/683270/19430 sis=683269) > [508,181,357,22,592,228,250,383,28,213,586]p508(0) r=0 lpr=683269 > crt=685201'4130317 lcod 685201'4130316 mlcod 685201'4130316 active+clean > TIME_FOR_DEEP > ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]] > scrubber : handle_scrub_reserve_grant: received unsolicited > reservation grant from osd 586(10) (0x55e1b57354a0) > 2022-06-24T11:45:12.417867+02:00 osd031 ceph-osd[21674]: osd.560 pg_epoch: > 685367 pg[5.6e2s0( v 685198'4133308 (680724'4126463,685198'4133308] > local-lis/les=675991/675992 n=375710 ec=1104/1104 lis/c=675991/675991 > les/c/f=675992/675992/19430 sis=675991) > [560,259,440,156,324,358,338,218,191,335,256]p560(0) r=0 lpr=675991 > crt=685198'4133308 lcod 685198'4133307 mlcod 685198'4133307 active+clean > TIME_FOR_DEEP > ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]] > scrubber : handle_scrub_reserve_grant: received unsolicited > reservation grant from osd 259(1) (0x559a5f371080) > 2022-06-24T11:45:12.453294+02:00 osd031 ceph-osd[22024]: osd.508 pg_epoch: > 685367 pg[5.166s0( v 685201'4130317 (680710'4123328,685201'4130317] > local-lis/les=683269/683270 n=375262 ec=1104/1104 lis/c=683269/683269 > les/c/f=683270/683270/19430 sis=683269) > [508,181,357,22,592,228,250,383,28,213,586]p508(0) r=0 lpr=683269 > crt=685201'4130317 lcod 685201'4130316 mlcod 685201'4130316 active+clean > TIME_FOR_DEEP > ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]] > scrubber : handle_scrub_reserve_grant: received unsolicited > reservation grant from osd 213(9) (0x55e1a9e922c0) > > Is the bug still there, or is this something else? > > Thanks!! > > Kenneth > > > > On 19 Dec 2021, at 11:05, Ronen Friedman wrote: > > > > On Sat, Dec 18, 2021 at 7:06 PM Ronen Friedman wrote: >> >> Hi all, >> >> This was indeed a bug, which I've already fixed in 'master'. >> I'll look for the backporting status tomorrow. >> >> Ronen >> > > The fix is part of a larger change (which fixes a more severe issue). Pending > (non-trivial) backport. > I'll try to speed this up. > > Ronen > > > > >> >> On Fri, Dec 17, 2021 at 1:49 PM Kenneth Waegeman >> wrote: >>> >>> Hi all, >>> >>> I'm also seeing these messages spamming the logs after update from >>> octopus to pacific 16.2.7. >>> >>> Any clue yet what this means? >>> >>> Thanks!! >>> >>> Kenneth >>> >>> On 29/10/2021 22:21, Alexander Y. Fomichev wrote: >>> > Hello. >>> > After upgrading to 'pacific' I found log spammed by messages like this: >>> > ... active+clean] scrubber pg(46.7aas0) handle_scrub_reserve_grant: >>> > received unsolicited reservation grant from osd 138(1) (0x560e77c51600) >>> > >>> > If I understand it correctly this is exactly what it looks, and this is >>> > not >>> > good. Running with debug osd 1/5 don't help much and google bring me >>> > nothing and I stuck. Could anybody give a hint what's happening or where >>> > to dig. >>> > >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph recovery network speed
On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder wrote: > I think this is just how ceph is. Maybe you should post the output of > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an > idea whether what you look at is expected or not. As I wrote before, object > recovery is throttled and the recovery bandwidth depends heavily on object > size. The interesting question is, how many objects per second are > recovered/rebalanced > data: pools: 11 pools, 369 pgs objects: 2.45M objects, 9.2 TiB usage: 20 TiB used, 60 TiB / 80 TiB avail pgs: 512136/9729081 objects misplaced (5.264%) 343 active+clean 22 active+remapped+backfilling io: client: 2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr recovery: 34 MiB/s, 8 objects/s Pool 12 is the only one with any stats. pool EC-22-Pool id 12 510048/9545052 objects misplaced (5.344%) recovery io 36 MiB/s, 9 objects/s client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr --- RAW STORAGE --- CLASSSIZE AVAILUSED RAW USED %RAW USED hdd80 TiB 60 TiB 20 TiB20 TiB 25.45 TOTAL 80 TiB 60 TiB 20 TiB20 TiB 25.45 --- POOLS --- POOLID PGS STORED OBJECTS USED %USED MAX AVAIL .mgr 11 152 MiB 38 457 MiB 0 9.2 TiB 21BadPool3 328 KiB1 12 KiB 0 18 TiB .rgw.root4 32 1.3 KiB4 48 KiB 0 9.2 TiB default.rgw.log 5 32 3.6 KiB 209 408 KiB 0 9.2 TiB default.rgw.control 6 32 0 B8 0 B 0 9.2 TiB default.rgw.meta 78 6.7 KiB 20 203 KiB 0 9.2 TiB rbd_rep_pool 8 32 2.0 MiB5 5.9 MiB 0 9.2 TiB default.rgw.buckets.index98 2.0 MiB 33 5.9 MiB 0 9.2 TiB default.rgw.buckets.non-ec 10 32 1.4 KiB0 4.3 KiB 0 9.2 TiB default.rgw.buckets.data11 32 232 GiB 61.02k 697 GiB 2.41 9.2 TiB EC-22-Pool 12 128 9.8 TiB2.39M 20 TiB 41.55 14 TiB > Maybe provide the output of the first two commands for > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a bit > after setting these and then collect the output). Include the applied > values for osd_max_backfills* and osd_recovery_max_active* for one of the > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e > osd_recovery_max_active). > I didn't notice any speed difference with sleep values changed, but I'll grab the stats between changes when I have a chance. ceph config show osd.19 | egrep 'osd_max_backfills|osd_recovery_max_active' osd_max_backfills1000 override mon[5] osd_recovery_max_active 1000 override osd_recovery_max_active_hdd 1000 override mon[5] osd_recovery_max_active_ssd 1000 override > > I don't really know if on such a small cluster one can expect more than > what you see. It has nothing to do with network speed if you have a 10G > line. However, recovery is something completely different from a full > link-speed copy. > > I can tell you that boatloads of tiny objects are a huge pain for > recovery, even on SSD. Ceph doesn't raid up sections of disks against each > other, but object for object. This might be a feature request: that PG > space allocation and recovery should follow the model of LVM extends > (ideally match with LVM extends) to allow recovery/rebalancing larger > chunks of storage in one go, containing parts of a large or many small > objects. > > Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Curt > Sent: 27 June 2022 17:35:19 > To: Frank Schilder > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] Re: Ceph recovery network speed > > Hello, > > I had already increased/changed those variables previously. I increased > the pg_num to 128. Which increased the number of PG's backfilling, but > speed is still only at 30 MiB/s avg and has been backfilling 23 pg for the > last several hours. Should I increase it higher than 128? > > I'm still trying to figure out if this is just how ceph is or if there is > a bottleneck somewhere. Like if I sftp a 10G file between servers it's > done in a couple min or less. Am I thinking of this wrong? > > Thanks, > Curt > > On Mon, Jun 27, 2022 at 12:33 PM Frank Schilder fr...@dtu.dk>> wrote: > Hi Curt, > > as far as I understood, a 2+2 EC pool is recovering, which makes 1 OSD per > host busy. My experience is, that the algorithm for selecting PGs to > backfill/recover is not very smart. It could simply be that it doesn't find > more PGs without
[ceph-users] Re: Ceph recovery network speed
I saw a major boost after having the sleep_hdd set to 0. Only after that did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k obj/sec. Eventually it tapered back down, but for me sleep was the key, and specifically in my case: osd_recovery_sleep_hdd On Mon, Jun 27, 2022 at 11:17 AM Curt wrote: > On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder wrote: > > > I think this is just how ceph is. Maybe you should post the output of > > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an > > idea whether what you look at is expected or not. As I wrote before, > object > > recovery is throttled and the recovery bandwidth depends heavily on > object > > size. The interesting question is, how many objects per second are > > recovered/rebalanced > > > data: > pools: 11 pools, 369 pgs > objects: 2.45M objects, 9.2 TiB > usage: 20 TiB used, 60 TiB / 80 TiB avail > pgs: 512136/9729081 objects misplaced (5.264%) > 343 active+clean > 22 active+remapped+backfilling > > io: > client: 2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr > recovery: 34 MiB/s, 8 objects/s > > Pool 12 is the only one with any stats. > > pool EC-22-Pool id 12 > 510048/9545052 objects misplaced (5.344%) > recovery io 36 MiB/s, 9 objects/s > client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr > > --- RAW STORAGE --- > CLASSSIZE AVAILUSED RAW USED %RAW USED > hdd80 TiB 60 TiB 20 TiB20 TiB 25.45 > TOTAL 80 TiB 60 TiB 20 TiB20 TiB 25.45 > > --- POOLS --- > POOLID PGS STORED OBJECTS USED %USED MAX > AVAIL > .mgr 11 152 MiB 38 457 MiB 0 > 9.2 TiB > 21BadPool3 328 KiB1 12 KiB 0 > 18 TiB > .rgw.root4 32 1.3 KiB4 48 KiB 0 > 9.2 TiB > default.rgw.log 5 32 3.6 KiB 209 408 KiB 0 > 9.2 TiB > default.rgw.control 6 32 0 B8 0 B 0 > 9.2 TiB > default.rgw.meta 78 6.7 KiB 20 203 KiB 0 > 9.2 TiB > rbd_rep_pool 8 32 2.0 MiB5 5.9 MiB 0 > 9.2 TiB > default.rgw.buckets.index98 2.0 MiB 33 5.9 MiB 0 > 9.2 TiB > default.rgw.buckets.non-ec 10 32 1.4 KiB0 4.3 KiB 0 > 9.2 TiB > default.rgw.buckets.data11 32 232 GiB 61.02k 697 GiB 2.41 > 9.2 TiB > EC-22-Pool 12 128 9.8 TiB2.39M 20 TiB 41.55 > 14 TiB > > > > > Maybe provide the output of the first two commands for > > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a > bit > > after setting these and then collect the output). Include the applied > > values for osd_max_backfills* and osd_recovery_max_active* for one of the > > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e > > osd_recovery_max_active). > > > > I didn't notice any speed difference with sleep values changed, but I'll > grab the stats between changes when I have a chance. > > ceph config show osd.19 | egrep 'osd_max_backfills|osd_recovery_max_active' > osd_max_backfills1000 > > > override mon[5] > osd_recovery_max_active 1000 > > > override > osd_recovery_max_active_hdd 1000 > > > override mon[5] > osd_recovery_max_active_ssd 1000 > > > override > > > > > I don't really know if on such a small cluster one can expect more than > > what you see. It has nothing to do with network speed if you have a 10G > > line. However, recovery is something completely different from a full > > link-speed copy. > > > > I can tell you that boatloads of tiny objects are a huge pain for > > recovery, even on SSD. Ceph doesn't raid up sections of disks against > each > > other, but object for object. This might be a feature request: that PG > > space allocation and recovery should follow the model of LVM extends > > (ideally match with LVM extends) to allow recovery/rebalancing larger > > chunks of storage in one go, containing parts of a large or many small > > objects. > > > > Best regards, > > = > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > > > From: Curt > > Sent: 27 June 2022 17:35:19 > > To: Frank Schilder > > Cc: ceph-users@ceph.io > > Subject: Re: [ceph-users] Re: Ceph recovery network speed > > > > Hello, > > > > I had already increased/changed those variables previously. I increased > > the pg_num to 128. Which increased the number of PG's backfilling, but > > speed is still only at 30 MiB/s avg and has been backfilling 23 pg for > the > > last several hours. Should I increase it higher than 128? > > >
[ceph-users] Re: Ceph recovery network speed
I would love to see those types of speeds. I tried setting it all the way to 0 and nothing, I did that before I sent the first email, maybe it was your old post I got it from. osd_recovery_sleep_hdd 0.00 override (mon[0.00]) On Mon, Jun 27, 2022 at 9:27 PM Robert Gallop wrote: > I saw a major boost after having the sleep_hdd set to 0. Only after that > did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k > obj/sec. > > Eventually it tapered back down, but for me sleep was the key, and > specifically in my case: > > osd_recovery_sleep_hdd > > On Mon, Jun 27, 2022 at 11:17 AM Curt wrote: > >> On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder wrote: >> >> > I think this is just how ceph is. Maybe you should post the output of >> > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an >> > idea whether what you look at is expected or not. As I wrote before, >> object >> > recovery is throttled and the recovery bandwidth depends heavily on >> object >> > size. The interesting question is, how many objects per second are >> > recovered/rebalanced >> > >> data: >> pools: 11 pools, 369 pgs >> objects: 2.45M objects, 9.2 TiB >> usage: 20 TiB used, 60 TiB / 80 TiB avail >> pgs: 512136/9729081 objects misplaced (5.264%) >> 343 active+clean >> 22 active+remapped+backfilling >> >> io: >> client: 2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr >> recovery: 34 MiB/s, 8 objects/s >> >> Pool 12 is the only one with any stats. >> >> pool EC-22-Pool id 12 >> 510048/9545052 objects misplaced (5.344%) >> recovery io 36 MiB/s, 9 objects/s >> client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr >> >> --- RAW STORAGE --- >> CLASSSIZE AVAILUSED RAW USED %RAW USED >> hdd80 TiB 60 TiB 20 TiB20 TiB 25.45 >> TOTAL 80 TiB 60 TiB 20 TiB20 TiB 25.45 >> >> --- POOLS --- >> POOLID PGS STORED OBJECTS USED %USED MAX >> AVAIL >> .mgr 11 152 MiB 38 457 MiB 0 >> 9.2 TiB >> 21BadPool3 328 KiB1 12 KiB 0 >> 18 TiB >> .rgw.root4 32 1.3 KiB4 48 KiB 0 >> 9.2 TiB >> default.rgw.log 5 32 3.6 KiB 209 408 KiB 0 >> 9.2 TiB >> default.rgw.control 6 32 0 B8 0 B 0 >> 9.2 TiB >> default.rgw.meta 78 6.7 KiB 20 203 KiB 0 >> 9.2 TiB >> rbd_rep_pool 8 32 2.0 MiB5 5.9 MiB 0 >> 9.2 TiB >> default.rgw.buckets.index98 2.0 MiB 33 5.9 MiB 0 >> 9.2 TiB >> default.rgw.buckets.non-ec 10 32 1.4 KiB0 4.3 KiB 0 >> 9.2 TiB >> default.rgw.buckets.data11 32 232 GiB 61.02k 697 GiB 2.41 >> 9.2 TiB >> EC-22-Pool 12 128 9.8 TiB2.39M 20 TiB 41.55 >> 14 TiB >> >> >> >> > Maybe provide the output of the first two commands for >> > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a >> bit >> > after setting these and then collect the output). Include the applied >> > values for osd_max_backfills* and osd_recovery_max_active* for one of >> the >> > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e >> > osd_recovery_max_active). >> > >> >> I didn't notice any speed difference with sleep values changed, but I'll >> grab the stats between changes when I have a chance. >> >> ceph config show osd.19 | egrep >> 'osd_max_backfills|osd_recovery_max_active' >> osd_max_backfills1000 >> >> >> override mon[5] >> osd_recovery_max_active 1000 >> >> >> override >> osd_recovery_max_active_hdd 1000 >> >> >> override mon[5] >> osd_recovery_max_active_ssd 1000 >> >> >> override >> >> > >> > I don't really know if on such a small cluster one can expect more than >> > what you see. It has nothing to do with network speed if you have a 10G >> > line. However, recovery is something completely different from a full >> > link-speed copy. >> > >> > I can tell you that boatloads of tiny objects are a huge pain for >> > recovery, even on SSD. Ceph doesn't raid up sections of disks against >> each >> > other, but object for object. This might be a feature request: that PG >> > space allocation and recovery should follow the model of LVM extends >> > (ideally match with LVM extends) to allow recovery/rebalancing larger >> > chunks of storage in one go, containing parts of a large or many small >> > objects. >> > >> > Best regards, >> > = >> > Frank Schilder >> > AIT Risø Campus >> > Bygning 109, rum S14 >> > >> >
[ceph-users] calling ceph command from a crush_location_hook - fails to find sys.stdin.isatty()
[ceph pacific 16.2.9] I have a crush_location_hook script which is a small python3 script that figures out the correct root/chassis/host location for a particular OSD. Our map has 2 roots, one for an all-SSD, and another for HDDs, thus the need for the location hook. Without it, the SSD devices end up in the wrong crush location. Prior to 16.2.9 release, they weren't being used because of a bug that was causing the OSDs to crash with the hook. Now that we've upgraded to 16.2.9 we want to use our location hook script again, but it fails in a different way. The script works correctly when testing it standalone with the right parameters, but when it is called by the OSD process, it fails because when the ceph command references 'sys.stdin.isatty()' (at line 538 in /usr/bin/ceph), it isn't found because sys.stdin is NoneType. I suspect this is because of how the OSD spawns the crush hook script, which then forks the ceph command. Somehow python (3.8) is not initializing the stdin, stdout, stderr members in the 'sys' module object. Looking for guidance on how to get my location hook script to successfully use the "ceph" command to get the output of "ceph osd tree --format json" thanks, Wyllys Ingersoll ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph recovery network speed
On Mon, Jun 27, 2022 at 11:08 PM Frank Schilder wrote: > Do you, by any chance have SMR drives? This may not be stated on the > drive, check what the internet has to say. I also would have liked to see > the beginning of the ceph status, number of hosts, number of OSDs, up in > down whatever. Can you also send the result of ceph osd df tree? > As far as I can tell none of the drives are SMR drives. I did have some inconsistent pop up, scrubs are still running. cluster: id: 1684fe88-aae0-11ec-9593-df430e3982a0 health: HEALTH_ERR 10 scrub errors Possible data damage: 4 pgs inconsistent services: mon: 5 daemons, quorum cephmgr,cephmon1,cephmon2,cephmon3,cephmgr2 (age 8w) mgr: cephmon1.fxtvtu(active, since 2d), standbys: cephmon2.wrzwwn, cephmgr2.hzsrdo, cephmgr.bazebq osd: 44 osds: 44 up (since 3d), 44 in (since 3d); 28 remapped pgs rgw: 2 daemons active (2 hosts, 1 zones) data: pools: 11 pools, 369 pgs objects: 2.45M objects, 9.2 TiB usage: 21 TiB used, 59 TiB / 80 TiB avail pgs: 503944/9729081 objects misplaced (5.180%) 337 active+clean 28 active+remapped+backfilling 4 active+clean+inconsistent io: client: 1000 KiB/s rd, 717 KiB/s wr, 81 op/s rd, 57 op/s wr recovery: 34 MiB/s, 8 objects/s ID CLASS WEIGHTREWEIGHT SIZE RAW USE DATA OMAP META AVAIL%USE VAR PGS STATUS TYPE NAME -1 80.05347 - 80 TiB21 TiB21 TiB 32 MiB 69 GiB 59 TiB 26.23 1.00- root default -5 20.01337 - 20 TiB 5.3 TiB 5.3 TiB 1.4 MiB 19 GiB 15 TiB 26.47 1.01- host hyperion01 1hdd 1.81940 1.0 1.8 TiB 749 GiB 747 GiB 224 KiB 2.2 GiB 1.1 TiB 40.19 1.53 36 up osd.1 3hdd 1.81940 1.0 1.8 TiB 531 GiB 530 GiB3 KiB 1.9 GiB 1.3 TiB 28.52 1.09 31 up osd.3 5hdd 1.81940 1.0 1.8 TiB 167 GiB 166 GiB 36 KiB 1.2 GiB 1.7 TiB 8.98 0.34 18 up osd.5 7hdd 1.81940 1.0 1.8 TiB 318 GiB 316 GiB 83 KiB 1.2 GiB 1.5 TiB 17.04 0.65 26 up osd.7 9hdd 1.81940 1.0 1.8 TiB 1017 GiB 1014 GiB 139 KiB 2.6 GiB 846 GiB 54.59 2.08 38 up osd.9 11hdd 1.81940 1.0 1.8 TiB 569 GiB 567 GiB4 KiB 2.1 GiB 1.3 TiB 30.56 1.17 29 up osd.11 13hdd 1.81940 1.0 1.8 TiB 293 GiB 291 GiB 338 KiB 1.5 GiB 1.5 TiB 15.72 0.60 23 up osd.13 15hdd 1.81940 1.0 1.8 TiB 368 GiB 366 GiB 641 KiB 1.6 GiB 1.5 TiB 19.74 0.75 23 up osd.15 17hdd 1.81940 1.0 1.8 TiB 369 GiB 367 GiB2 KiB 1.5 GiB 1.5 TiB 19.80 0.75 26 up osd.17 19hdd 1.81940 1.0 1.8 TiB 404 GiB 403 GiB7 KiB 1.1 GiB 1.4 TiB 21.69 0.83 31 up osd.19 45hdd 1.81940 1.0 1.8 TiB 639 GiB 637 GiB2 KiB 2.0 GiB 1.2 TiB 34.30 1.31 32 up osd.45 -3 20.01337 - 20 TiB 5.2 TiB 5.2 TiB 2.0 MiB 18 GiB 15 TiB 26.15 1.00- host hyperion02 0hdd 1.81940 1.0 1.8 TiB 606 GiB 604 GiB 302 KiB 2.0 GiB 1.2 TiB 32.52 1.24 33 up osd.0 2hdd 1.81940 1.0 1.8 TiB58 GiB58 GiB 112 KiB 249 MiB 1.8 TiB 3.14 0.12 14 up osd.2 4hdd 1.81940 1.0 1.8 TiB 254 GiB 252 GiB 14 KiB 1.6 GiB 1.6 TiB 13.63 0.52 28 up osd.4 6hdd 1.81940 1.0 1.8 TiB 574 GiB 572 GiB1 KiB 1.8 GiB 1.3 TiB 30.81 1.17 26 up osd.6 8hdd 1.81940 1.0 1.8 TiB 201 GiB 200 GiB 618 KiB 743 MiB 1.6 TiB 10.77 0.41 23 up osd.8 10hdd 1.81940 1.0 1.8 TiB 628 GiB 626 GiB4 KiB 2.2 GiB 1.2 TiB 33.72 1.29 37 up osd.10 12hdd 1.81940 1.0 1.8 TiB 355 GiB 353 GiB 361 KiB 1.2 GiB 1.5 TiB 19.03 0.73 30 up osd.12 14hdd 1.81940 1.0 1.8 TiB 1.1 TiB 1.1 TiB1 KiB 2.7 GiB 708 GiB 62.00 2.36 38 up osd.14 16hdd 1.81940 1.0 1.8 TiB 240 GiB 239 GiB4 KiB 1.2 GiB 1.6 TiB 12.90 0.49 20 up osd.16 18hdd 1.81940 1.0 1.8 TiB 300 GiB 298 GiB 542 KiB 1.6 GiB 1.5 TiB 16.08 0.61 21 up osd.18 32hdd 1.81940 1.0 1.8 TiB 989 GiB 986 GiB 45 KiB 2.7 GiB 874 GiB 53.09 2.02 36 up osd.32 -7 20.01337 - 20 TiB 5.2 TiB 5.2 TiB 2.9 MiB 17 GiB 15 TiB 26.06 0.99- host hyperion03 22hdd 1.81940 1.0 1.8 TiB 449 GiB 448 GiB 443 KiB 1.5 GiB 1.4 TiB 24.10 0.92 31 up osd.22 23hdd 1.81940 1.0 1.8 TiB 299 GiB 298 GiB5 Ki