[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster
Thanks! As far as I can see, this is the same problem as mine. ср, 15 дек. 2021 г. в 16:49, Chris Dunlop : > On Wed, Dec 15, 2021 at 02:05:05PM +1000, Michael Uleysky wrote: > > I try to upgrade three-node nautilus cluster to pacific. I am updating > ceph > > on one node and restarting daemons. OSD ok, but monitor cannot enter > quorum. > > Sounds like the same thing as: > > Pacific mon won't join Octopus mons > https://tracker.ceph.com/issues/52488 > > Unforutunately there's no resolution. > > For a bit more background, see also the thread starting: > > New pacific mon won't join with octopus mons > https://www.spinics.net/lists/ceph-devel/msg52181.html > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How to clean up data in OSDs
Hello Team, After testing our cluster we removed and recreated all ceph pools which actually cleaned up all users and buckets, but we can still see data in the disks. is there a easy way to clean up all osds without actually removing and reconfiguring them? what can be the best way to solve this problem? currently we are experiencing RGW demon crashes as rados still try to look in to old buckets. Any help is much appreciated. Regards, Akkina ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] what does "Message has implicit destination" mean
The message is being held because: Message has implicit destination ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Snapshot mirroring problem
Hi I'm having trouble getting snapshot replication to work. I have 2 clusters, 714-ceph on RHEL/16.2.0-146.el8cp and dcn-ceph on CentOS Stream 8/16.2.6. I trying to enable one-way replication from 714-ceph -> dcn-ceph. Adding peer: " # rbd mirror pool info Mode: image Site Name: dcn-ceph Peer Sites: none # rbd --cluster dcn-ceph mirror pool peer bootstrap import --direction rx-only --site-name dcn-ceph rbd /tmp/token 2021-12-15T08:24:20.250+ 7fa8b498d2c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 2021-12-15T08:24:20.251+ 7fa8b498d2c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory 2021-12-15T08:24:20.251+ 7fa8b498d2c0 -1 auth: unable to find a keyring on /etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory # dcn-ceph-01/root tmp # rbd mirror pool info Mode: image Site Name: dcn-ceph Peer Sites: UUID: cd68b1bb-3e0b-4f9d-bd52-4ff5804c9746 Name: 714-ceph Direction: rx-only Client: client.rbd-mirror-peer " I presume the error is benign, as pr https://bugzilla.redhat.com/show_bug.cgi?id=1981186, and the peer relation seems to be established. On 714-ceph: " # rbd mirror pool info Mode: image Site Name: 714-ceph Peer Sites: UUID: cabf78ce-f65f-4a27-a648-20b3fd326647 Name: dcn-ceph Mirror UUID: 4132f9e2-555f-4363-8c62-72f0db37f700 Direction: tx-only # rbd mirror pool info rbd Mode: image Site Name: 714-ceph Peer Sites: UUID: cabf78ce-f65f-4a27-a648-20b3fd326647 Name: dcn-ceph Mirror UUID: 4132f9e2-555f-4363-8c62-72f0db37f700 Direction: tx-only rbd mirror image enable rbd/rbdmirrortest snapshot # rbd info rbd/rbdmirrortest rbd image 'rbdmirrortest': size 100 GiB in 25600 objects order 22 (4 MiB objects) snapshot_count: 1 id: 46e06591893e12 block_name_prefix: rbd_data.46e06591893e12 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Tue Dec 14 08:46:59 2021 access_timestamp: Tue Dec 14 10:30:01 2021 modify_timestamp: Tue Dec 14 08:46:59 2021 mirroring state: enabled mirroring mode: snapshot mirroring global id: 3c92991f-b0ae-496f-adb6-f2f25cbb2220 mirroring primary: true # rbd mirror image status rbd/rbdmirrortest rbdmirrortest: global_id: 3c92991f-b0ae-496f-adb6-f2f25cbb2220 snapshots: 109 .mirror.primary.3c92991f-b0ae-496f-adb6-f2f25cbb2220.6d267de5-e9db-4dfd-b626-a4df87ad2485 (peer_uuids:[cabf78ce-f65f-4a27-a648-20b3fd326647]) " Looks good? At the dcn-ceph end the image is created, but then nothing happens: " # rbd info rbd/rbdmirrortest rbd image 'rbdmirrortest': size 100 GiB in 25600 objects order 22 (4 MiB objects) snapshot_count: 0 id: 8e3863f675783d data_pool: rbd_data block_name_prefix: rbd_data.4.8e3863f675783d format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten, data-pool, non-primary op_features: flags: create_timestamp: Wed Dec 15 08:25:20 2021 access_timestamp: Wed Dec 15 08:25:20 2021 modify_timestamp: Wed Dec 15 08:25:20 2021 mirroring state: unknown mirroring mode: snapshot mirroring global id: 3c92991f-b0ae-496f-adb6-f2f25cbb2220 mirroring primary: false # rbd mirror image status rbd/rbdmirrortest rbd: mirroring not enabled on the image " Any ideas? Thanks, Torkil ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Snapshot mirroring problem
Hi Torkil, On 12/15/21 09:45, Torkil Svensgaard wrote: > I'm having trouble getting snapshot replication to work. I have 2 > clusters, 714-ceph on RHEL/16.2.0-146.el8cp and dcn-ceph on CentOS > Stream 8/16.2.6. I trying to enable one-way replication from 714-ceph -> > dcn-ceph. I didn't try the one way replication myself with the snapshot mode so I can't say for sure, but there is an issue in 16.2.6 [1]. It has been fixed and backported into 16.2.7, an update to that version may solve your problem! [1]: https://tracker.ceph.com/issues/52675 Cheers, -- Arthur Outhenin-Chalandre ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MAX AVAIL capacity mismatch || mimic(13.2)
Hi, Our total number of hdd-OSD is 40. 40X5.5TB=220. we are using 3 replica for every pool. So, "Max avail" should show 220/3= 73.3. Am I right? what is the meaning of "variance 1.x". I think we might have wrong configuration , but need to find it. We have some more SSD-OSD, , yeah total capacity is showing by calculating hdd+ssd. but pool wise max available should difference. # ceph osd df ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 0 hdd 5.57100 1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.74 1.35 871 1 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 34.25 1.22 840 2 hdd 5.57100 1.0 5.6 TiB 1.8 TiB 3.8 TiB 31.53 1.13 831 3 hdd 5.57100 1.0 5.6 TiB 2.2 TiB 3.4 TiB 38.80 1.39 888 4 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.22 1.19 866 5 hdd 5.57100 1.0 5.6 TiB 2.0 TiB 3.6 TiB 36.12 1.29 837 6 hdd 5.57100 1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.12 1.15 858 7 hdd 5.57100 1.0 5.6 TiB 1.7 TiB 3.9 TiB 29.63 1.06 851 8 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.57 1.20 799 9 hdd 5.57100 1.0 5.6 TiB 1.6 TiB 4.0 TiB 28.73 1.03 793 10 hdd 5.57100 1.0 5.6 TiB 1.6 TiB 3.9 TiB 29.51 1.05 839 11 hdd 5.57100 1.0 5.6 TiB 2.0 TiB 3.6 TiB 36.19 1.29 860 12 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.61 1.20 904 13 hdd 5.57100 1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.52 1.16 807 14 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 34.17 1.22 845 15 hdd 5.57100 1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.61 1.34 836 16 hdd 5.57100 1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.12 1.11 881 17 hdd 5.57100 1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.66 1.17 876 18 hdd 5.57100 1.0 5.6 TiB 2.4 TiB 3.2 TiB 42.29 1.51 860 19 hdd 5.57100 1.0 5.6 TiB 1.7 TiB 3.9 TiB 29.93 1.07 828 20 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.6 TiB 34.65 1.24 854 21 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.62 1.20 845 22 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.21 1.19 797 23 hdd 5.57100 1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.75 1.31 839 24 hdd 5.57100 1.0 5.6 TiB 2.1 TiB 3.5 TiB 36.98 1.32 829 25 hdd 5.57100 1.0 5.6 TiB 1.7 TiB 3.9 TiB 30.86 1.10 878 26 hdd 5.57100 1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.68 1.31 867 27 hdd 5.57100 1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.13 1.11 842 28 hdd 5.57100 1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.12 1.15 821 29 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.44 1.19 871 30 hdd 5.57100 1.0 5.6 TiB 2.0 TiB 3.6 TiB 35.97 1.29 813 31 hdd 5.57100 1.0 5.6 TiB 1.7 TiB 3.9 TiB 30.60 1.09 812 32 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.6 TiB 34.65 1.24 836 33 hdd 5.57100 1.0 5.6 TiB 1.8 TiB 3.8 TiB 31.57 1.13 884 34 hdd 5.57100 1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.67 1.31 829 35 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.6 TiB 34.79 1.24 900 36 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.76 1.21 838 37 hdd 5.57100 1.0 5.6 TiB 2.1 TiB 3.4 TiB 38.21 1.37 796 38 hdd 5.57100 1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.26 1.12 841 39 hdd 5.57100 1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.76 1.21 830 40 ssd 1.81898 1.0 1.8 TiB 22 GiB 1.8 TiB 1.18 0.04 112 42 ssd 1.81879 1.0 1.8 TiB 21 GiB 1.8 TiB 1.12 0.04 107 43 ssd 1.81879 1.0 1.8 TiB 24 GiB 1.8 TiB 1.27 0.05 121 44 ssd 1.81879 1.0 1.8 TiB 20 GiB 1.8 TiB 1.06 0.04 101 45 ssd 1.81879 1.0 1.8 TiB 23 GiB 1.8 TiB 1.24 0.04 116 46 ssd 1.81879 1.0 1.8 TiB 24 GiB 1.8 TiB 1.27 0.05 120 47 ssd 1.81879 1.0 1.8 TiB 22 GiB 1.8 TiB 1.17 0.04 110 48 ssd 1.81879 1.0 1.8 TiB 23 GiB 1.8 TiB 1.26 0.04 120 49 ssd 1.81879 1.0 1.8 TiB 23 GiB 1.8 TiB 1.21 0.04 117 41 ssd 1.81898 1.0 1.8 TiB 18 GiB 1.8 TiB 0.97 0.03 94 50 ssd 1.81940 1.0 1.8 TiB 22 GiB 1.8 TiB 1.19 0.04 115 51 ssd 1.81940 1.0 1.8 TiB 19 GiB 1.8 TiB 1.03 0.04 98 52 ssd 1.81940 1.0 1.8 TiB 22 GiB 1.8 TiB 1.16 0.04 109 53 ssd 1.81940 1.0 1.8 TiB 21 GiB 1.8 TiB 1.13 0.04 105 54 ssd 1.81940 1.0 1.8 TiB 25 GiB 1.8 TiB 1.36 0.05 128 55 ssd 1.81940 1.0 1.8 TiB 22 GiB 1.8 TiB 1.19 0.04 113 56 ssd 1.81940 1.0 1.8 TiB 27 GiB 1.8 TiB 1.43 0.05 140 57 ssd 1.81940 1.0 1.8 TiB 24 GiB 1.8 TiB 1.29 0.05 122 58 ssd 1.81940 1.0 1.8 TiB 21 GiB 1.8 TiB 1.13 0.04 107 59 ssd 1.81940 1.0 1.8 TiB 21 GiB 1.8 TiB 1.12 0.04 111 60 ssd 1.81940 1.0 1.8 TiB 27 GiB 1.8 TiB 1.45 0.05 137 61 ssd 1.81940 1.0 1.8 TiB 23 GiB 1.8 TiB 1.24 0.04 117 62 ssd 1.81940 1.0 1.8 TiB 22 GiB 1.8 TiB 1.16 0.04 112 63 ssd 1.81940 1.0 1.8 TiB 25 GiB 1.8 TiB 1.32 0.05 126 64 ssd 1.81940 1.0 1.8 TiB 23 GiB 1.8 TiB 1.23 0.04 115 65 ssd 1.81940 1.0 1.8 TiB 20 GiB 1.8 TiB 1.07 0.04 99 66 ssd 1.81940 1.0 1.8 TiB 19 GiB 1.8 TiB 1.03 0.04 100 TOTAL 272 TiB 76 TiB 196 TiB 27.99 # ceph df GLOBAL: SIZEAVAIL RAW USED %RAW USED 272 TiB 196 TiB 76 TiB 28.02
[ceph-users] RBD mirroring bootstrap peers - direction
Hi I'm confused by the direction parameter in the documentation[1]. If I have my data at site-a and want one way replication to site-b should the mirroring be configured as the documentation example, directionwise? E.g. rbd --cluster site-a mirror pool peer bootstrap create --site-name site-a image-pool (get token) rbd --cluster site-b mirror pool peer bootstrap import --site-name site-b --direction rx-only image-pool token Mvh. Torkil [1] https://docs.ceph.com/en/latest/rbd/rbd-mirroring/#bootstrap-peers ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: what does "Message has implicit destination" mean
Den ons 15 dec. 2021 kl 09:35 skrev Marc : > The message is being held because: > > Message has implicit destination Usually stuff like "the maillist wasn't in the To: field, but only CC: or BCC:" -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD mirroring bootstrap peers - direction
Hi Torkil, On 12/15/21 13:24, Torkil Svensgaard wrote: > I'm confused by the direction parameter in the documentation[1]. If I > have my data at site-a and want one way replication to site-b should the > mirroring be configured as the documentation example, directionwise? What you are describing seems at first glance right. The rbd-mirror daemon semantic is to replicate data from a remote cluster to the local cluster. But I am not sure, I use rx-tx everywhere... Also the default rx-tx would probably also work in your case as long as you don't try to run rbd-mirror on site-a. Cheers, -- Arthur Outhenin-Chalandre ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD mirroring bootstrap peers - direction
On 15/12/2021 13.44, Arthur Outhenin-Chalandre wrote: Hi Torkil, Hi Arthur On 12/15/21 13:24, Torkil Svensgaard wrote: I'm confused by the direction parameter in the documentation[1]. If I have my data at site-a and want one way replication to site-b should the mirroring be configured as the documentation example, directionwise? What you are describing seems at first glance right. The rbd-mirror daemon semantic is to replicate data from a remote cluster to the local cluster. But I am not sure, I use rx-tx everywhere... Also the default rx-tx would probably also work in your case as long as you don't try to run rbd-mirror on site-a. Ah, so as long as I don't run the mirror daemons on site-a there is no risk of overwriting production data there? I'm upgrading to 16.2.7 as you suggested in the other thread[1]. If that doesn't fix the issue I found another thread[2] suggesting the direction should be reversed, but that sounded a bit scary. Thanks, Torkil [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MGG7X5ITC4XA3JREAWU74DDEZTWLSSZE/ [2] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NOFX6TXZ7WRUV2ZSTI4N6EP73YN6JKQQ/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD mirroring bootstrap peers - direction
Hi Torkil, I would recommend sticking to rx-tx to make potential failback back to the primary cluster easier. There shouldn't be any issue with running rbd-mirror daemons at both sites either -- it doesn't start replicating until it is instructed to, either per-pool or per-image. Thanks, Ilya ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Snapshot mirroring problem
On 15/12/2021 10.17, Arthur Outhenin-Chalandre wrote: Hi Torkil, Hi Arthur On 12/15/21 09:45, Torkil Svensgaard wrote: I'm having trouble getting snapshot replication to work. I have 2 clusters, 714-ceph on RHEL/16.2.0-146.el8cp and dcn-ceph on CentOS Stream 8/16.2.6. I trying to enable one-way replication from 714-ceph -> dcn-ceph. I didn't try the one way replication myself with the snapshot mode so I can't say for sure, but there is an issue in 16.2.6 [1]. It has been fixed and backported into 16.2.7, an update to that version may solve your problem! Thanks, that did the trick =) Mvh. Torkil [1]: https://tracker.ceph.com/issues/52675 Cheers, ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD mirroring bootstrap peers - direction
On 15/12/2021 13.58, Ilya Dryomov wrote: Hi Torkil, Hi Ilya I would recommend sticking to rx-tx to make potential failback back to the primary cluster easier. There shouldn't be any issue with running rbd-mirror daemons at both sites either -- it doesn't start replicating until it is instructed to, either per-pool or per-image. Thanks for the clarification. Mvh. Torkil ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD mirroring bootstrap peers - direction
On 12/15/21 13:50, Torkil Svensgaard wrote: > Ah, so as long as I don't run the mirror daemons on site-a there is no > risk of overwriting production data there? To be perfectly clear there should be no risk whatsoever (as Ilya also said). I suggested to not run rbd-mirror on site-a so that replication from site-b to site-a wouldn't be a thing at all. That being said we also run a setup where we only need one way replication but for the same reasons posted by Ilya we use rx-tx and run rbd-mirror in both sites. Cheers, -- Arthur Outhenin-Chalandre ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster
Hmm that ticket came from the slightly unusual scenario where you were deploying a *new* Pacific monitor against an Octopus cluster. Michael, is your cluster deployed with cephadm? And is this a new or previously-existing monitor? On Wed, Dec 15, 2021 at 12:09 AM Michael Uleysky wrote: > > Thanks! > > As far as I can see, this is the same problem as mine. > > ср, 15 дек. 2021 г. в 16:49, Chris Dunlop : > > > On Wed, Dec 15, 2021 at 02:05:05PM +1000, Michael Uleysky wrote: > > > I try to upgrade three-node nautilus cluster to pacific. I am updating > > ceph > > > on one node and restarting daemons. OSD ok, but monitor cannot enter > > quorum. > > > > Sounds like the same thing as: > > > > Pacific mon won't join Octopus mons > > https://tracker.ceph.com/issues/52488 > > > > Unforutunately there's no resolution. > > > > For a bit more background, see also the thread starting: > > > > New pacific mon won't join with octopus mons > > https://www.spinics.net/lists/ceph-devel/msg52181.html > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Large latency for single thread
I create a rbd pool using only two SATA SSDs(one for data, another for database,WAL), and set the replica size 1. After that, I setup a fio test on Host same with the OSD placed. I found the latency is hundreds micro-seconds(sixty micro-seconds for the raw SATA SSD ). The fio outpus: m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15 14:05:32 2021 write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone resets slat (usec): min=4, max=123, avg=22.30, stdev= 9.18 clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67 lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99 clat percentiles (usec): | 1.00th=[ 709], 5.00th=[ 775], 10.00th=[ 824], 20.00th=[ 906], | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[ 1303], | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[ 1663], | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[ 3949], | 99.99th=[ 6718] bw ( KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54, stdev=588.79, samples=360 iops : min= 482, max= 1262, avg=794.76, stdev=147.20, samples=360 lat (usec) : 750=2.98%, 1000=22.41% lat (msec) : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01% cpu : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Parts of the OSD' perf status: "state_io_done_lat": { "avgcount": 151295, "sum": 0.336297058, "avgtime": 0.0 }, "state_kv_queued_lat": { "avgcount": 151295, "sum": 18.812333051, "avgtime": 0.000124342 }, "state_kv_commiting_lat": { "avgcount": 151295, "sum": 64.555436175, "avgtime": 0.000426685 }, "state_kv_done_lat": { "avgcount": 151295, "sum": 0.130403628, "avgtime": 0.00861 }, "state_deferred_queued_lat": { "avgcount": 148, "sum": 215.726286547, "avgtime": 1.457610044 }, ... ... "op_w_latency": { "avgcount": 151133, "sum": 130.134246667, "avgtime": 0.000861057 }, "op_w_process_latency": { "avgcount": 151133, "sum": 125.301196872, "avgtime": 0.000829079 }, "op_w_prepare_latency": { "avgcount": 151133, "sum": 29.892687947, "avgtime": 0.000197790 }, Is it reasonable for the benchmark test case? And how to improve it? It's really NOT friendly for single thread. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster
On 15.12.21 05:59, Linh Vu wrote: May not be directly related to your error, but they slap a DO NOT UPGRADE FROM AN OLDER VERSION label on the Pacific release notes for a reason... https://docs.ceph.com/en/latest/releases/pacific/ This is an unrelated issue (bluestore_fsck_quick_fix_on_mount) that has been fixed with 16.2.7. This page should be updated. The 16.2.7 release is currently not in the release index. Is there a reason for that? Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Large latency for single thread
Is this not just inherent to SDS? And wait for the new osd code, I think they are working on it. https://yourcmc.ru/wiki/Ceph_performance > > m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15 > 14:05:32 2021 > write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone > resets > slat (usec): min=4, max=123, avg=22.30, stdev= 9.18 > clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67 > lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99 > clat percentiles (usec): > | 1.00th=[ 709], 5.00th=[ 775], 10.00th=[ 824], > 20.00th=[ 906], > | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[ > 1303], > | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[ > 1663], > | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[ > 3949], > | 99.99th=[ 6718] > bw ( KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54, > stdev=588.79, samples=360 > iops : min= 482, max= 1262, avg=794.76, stdev=147.20, > samples=360 > lat (usec) : 750=2.98%, 1000=22.41% > lat (msec) : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01% > cpu : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2 > IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, > >=64=0.0% > submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, > >=64=0.0% > issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0 > latency : target=0, window=0, percentile=100.00%, depth=1 > > > Parts of the OSD' perf status: > > "state_io_done_lat": { > "avgcount": 151295, > "sum": 0.336297058, > "avgtime": 0.0 > }, > "state_kv_queued_lat": { > "avgcount": 151295, > "sum": 18.812333051, > "avgtime": 0.000124342 > }, > "state_kv_commiting_lat": { > "avgcount": 151295, > "sum": 64.555436175, > "avgtime": 0.000426685 > }, > "state_kv_done_lat": { > "avgcount": 151295, > "sum": 0.130403628, > "avgtime": 0.00861 > }, > "state_deferred_queued_lat": { > "avgcount": 148, > "sum": 215.726286547, > "avgtime": 1.457610044 > }, > > ... ... > > "op_w_latency": { > "avgcount": 151133, > "sum": 130.134246667, > "avgtime": 0.000861057 > }, > "op_w_process_latency": { > "avgcount": 151133, > "sum": 125.301196872, > "avgtime": 0.000829079 > }, > "op_w_prepare_latency": { > "avgcount": 151133, > "sum": 29.892687947, > "avgtime": 0.000197790 > }, > > Is it reasonable for the benchmark test case? And how to improve it? > It's really NOT friendly for single thread. > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Large latency for single thread
FWIW, we ran single OSD, iodepth=1 O_DSYNC write tests against classic and crimson bluestore OSDs in our Q3 crimson slide deck. You can see the results starting on slide 32 here: https://docs.google.com/presentation/d/1eydyAFKRea8n-VniQzXKW8qkKM9GLVMJt2uDjipJjQA/edit#slide=id.gf880cf6296_1_73 That was with the OSD restricted to 2 cores, but for these tests it shouldn't really matter. Also keep in mind that the fio client was on localhost as well. Note that Crimson is less efficient than the classic OSD in this test (while being more efficient in other tests) because the reactor is working in a tight loop to reduce latency and since the OSD isn't doing a ton of IO that ends up dominating in terms of CPU usage. Seastar provides an option to have the reactor be a bit more lazy that lowers idle CPU consumption but we don't utilize it yet. Running with replication across mulitple OSDs (that requires round trips to mulitple replicas) does make this tougher to do well on a real cluster. I suspect that long term crimson should be better at this kind of workload vs classic, but with synchronous replication we're always going to be fighting against the slowest link. Mark On 12/15/21 12:44 PM, Marc wrote: Is this not just inherent to SDS? And wait for the new osd code, I think they are working on it. https://yourcmc.ru/wiki/Ceph_performance m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15 14:05:32 2021 write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone resets slat (usec): min=4, max=123, avg=22.30, stdev= 9.18 clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67 lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99 clat percentiles (usec): | 1.00th=[ 709], 5.00th=[ 775], 10.00th=[ 824], 20.00th=[ 906], | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[ 1303], | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[ 1663], | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[ 3949], | 99.99th=[ 6718] bw ( KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54, stdev=588.79, samples=360 iops : min= 482, max= 1262, avg=794.76, stdev=147.20, samples=360 lat (usec) : 750=2.98%, 1000=22.41% lat (msec) : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01% cpu : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Parts of the OSD' perf status: "state_io_done_lat": { "avgcount": 151295, "sum": 0.336297058, "avgtime": 0.0 }, "state_kv_queued_lat": { "avgcount": 151295, "sum": 18.812333051, "avgtime": 0.000124342 }, "state_kv_commiting_lat": { "avgcount": 151295, "sum": 64.555436175, "avgtime": 0.000426685 }, "state_kv_done_lat": { "avgcount": 151295, "sum": 0.130403628, "avgtime": 0.00861 }, "state_deferred_queued_lat": { "avgcount": 148, "sum": 215.726286547, "avgtime": 1.457610044 }, ... ... "op_w_latency": { "avgcount": 151133, "sum": 130.134246667, "avgtime": 0.000861057 }, "op_w_process_latency": { "avgcount": 151133, "sum": 125.301196872, "avgtime": 0.000829079 }, "op_w_prepare_latency": { "avgcount": 151133, "sum": 29.892687947, "avgtime": 0.000197790 }, Is it reasonable for the benchmark test case? And how to improve it? It's really NOT friendly for single thread. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?
Thanks Linh Vu, so it sounds like i should be prepared to bounce the OSDs and/or Hosts, but I haven't heard anyone yet say that it won't work, so I guess there's that... On Tue, Dec 14, 2021 at 7:48 PM Linh Vu wrote: > I haven't tested this in Nautilus 14.2.22 (or any nautilus) but in > Luminous or older, if you go from a bigger size to a smaller size, there > was either a bug or a "feature-not-bug" that didn't allow the OSDs to > automatically purge the redundant PGs with data copies. I did this on a > size=5 to size=3 situation in a 1000+ OSD cluster, and also just recently > in a test Luminous cluster (size=3 to size=2). In order for the purge to > actually happen, I had to restart every OSD (one at a time for safety, or > just run ceph-ansible site.yml with the osd handler health check = true). > > On Wed, Dec 15, 2021 at 8:58 AM Marco Pizzolo > wrote: > >> Hi Martin, >> >> Agreed on the min_size of 2. I have no intention of worrying about uptime >> in event of a host failure. Once size of 2 is effectuated (and I'm unsure >> how long it will take), it is our intention to evacuate all OSDs in one of >> 4 hosts, in order to migrate the host to the new cluster, where its OSDs >> will then be added in. Once added and balanced, we will complete the >> copies (<3 days) and then migrate one more host allowing us to bring size >> to 3. Once balanced, we will collapse the last 2 nodes into the new >> cluster. I am hoping that inclusive of rebalancing the whole project will >> only take 3 weeks, but time will tell. >> >> Has anyone asked Ceph to reduce hundreds of millions if not billions of >> files from size 3 to size 2, and if so, were you successful? I know it >> *should* be able to do this, but sometimes theory and practice don't >> perfectly overlap. >> >> Thanks, >> Marco >> >> On Sat, Dec 11, 2021 at 4:37 AM Martin Verges >> wrote: >> >> > Hello, >> > >> > avoid size 2 whenever you can. As long as you know that you might lose >> > data, it can be an acceptable risk while migrating the cluster. We had >> that >> > in the past multiple time and it is a valid use case in our opinion. >> > However make sure to monitor the state and recover as fast as possible. >> > Leave min_size on 2 as well and accept the potential downtime! >> > >> > -- >> > Martin Verges >> > Managing director >> > >> > Mobile: +49 174 9335695 | Chat: https://t.me/MartinVerges >> > >> > croit GmbH, Freseniusstr. 31h, 81247 Munich >> > CEO: Martin Verges - VAT-ID: DE310638492 >> > Com. register: Amtsgericht Munich HRB 231263 >> > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx >> > >> > >> > On Fri, 10 Dec 2021 at 18:05, Marco Pizzolo >> > wrote: >> > >> >> Hello, >> >> >> >> As part of a migration process where we will be swinging Ceph hosts >> from >> >> one cluster to another we need to reduce the size from 3 to 2 in order >> to >> >> shrink the footprint sufficiently to allow safe removal of an OSD/Mon >> >> node. >> >> >> >> The cluster has about 500M objects as per dashboard, and is about >> 1.5PB in >> >> size comprised solely of small files served through CephFS to Samba. >> >> >> >> Has anyone encountered a similar situation? What (if any) problems did >> >> you >> >> face? >> >> >> >> Ceph 14.2.22 bare metal deployment on Centos. >> >> >> >> Thanks in advance. >> >> >> >> Marco >> >> ___ >> >> ceph-users mailing list -- ceph-users@ceph.io >> >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> > >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] NFS-ganesha .debs not on download.ceph.com
I've got Ceph running on Ubuntu 20.04 using Ceph-ansible, and I noticed that the .deb files for NFS-ganesha aren't on download.ceph.com. It seems the files should be here: https://download.ceph.com/nfs-ganesha/deb-V3.5-stable/pacific but "deb-V3.5-stable" doesn't exist. Poking around, I can see there's no debian repo for NFA-ganesha for pacific. Is this an error, or should ceph-ansible be configured to look elsewhere for the repo? -- Regards, Richard J. Zak Professional Genius PGP Key: https://keybase.io/rjzak/key.asc ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: CephFS Metadata Pool bandwidth usage
Hi Xiubo, Thanks very much for looking into this, that does sound like what might be happening in our case. Is this something that can be improved somehow - would disabling pinning or some config change help? Or could this be addressed in a future release? It seems somehow excessive to write so much metadata for each operation, it makes metadata operations constrained by the disk capacity. We were happy to use pinning as it's very natural for us and seems to make the FS more stable, however the metadata bandwidth usage is becoming a real issue. Thanks very much for your help, Kind regards, András On Thu, Dec 16, 2021, 03:43 Xiubo Li wrote: > I have figured out one case may could cause this, please see the tracker > https://tracker.ceph.com/issues/53623. > > Just in case that there has a large number of subtrees in mds.0 and the > size of ESubtreeMap event could reach up to 4MB. Then it's possibly that > for each LogSegment will only contain 2 event, that means when the an MDS > is submitting each new event, it's possibly will use a whole LogSegment. > > I think the most of handwidth usage could be writing the ESubtreeMap > events to metadata pool. > > -- Xiubo > > > > > On 12/13/21 9:52 PM, Gregory Farnum wrote: > > I see Xiubo started discussing this onhttps://tracker.ceph.com/issues/53542 > as well. > > So the large writes are going to the journal file, and sometimes it's > s single write of a full segment size, which is what I was curious > about. > > At this point the next step is seeing what is actually taking up that > space. You could turn up logging and send in a snippet, but I think > the easiest thing is going to involve: > * track one of those 4 MB full-object writes > Either a) pull the object in question off disk and look at it using > ceph-dencoder, or > b) Use cephfs-journal-tool to inspect the relevant journal range > > From your output below you could grab 200.02c7a084 (which is at > journal offset 0x2c7a084*4MiB), though that's probably been expired by > this point so you'll need to get another dump which contains a large > one. I haven't looked at these data structures using these tools in a > while so I'll leave more detail up to Xiubo. > -Greg > > On Fri, Dec 10, 2021 at 12:48 AM Andras Sali > wrote: > > Hi Greg, > > As a follow up, we see items similar to this pop up in the objecter_requests > (when it's not empty). Not sure if reading it right, but some appear quite > large (in the MB range?): > > { > "ops": [ > { > "tid": 9532804, > "pg": "3.f9c235d7", > "osd": 2, > "object_id": "200.02c7a084", > "object_locator": "@3", > "target_object_id": "200.02c7a084", > "target_object_locator": "@3", > "paused": 0, > "used_replica": 0, > "precalc_pgid": 0, > "last_sent": "1121127.434264s", > "age": 0.0160001041, > "attempts": 1, > "snapid": "head", > "snap_context": "0=[]", > "mtime": "2021-12-10T08:35:34.582215+", > "osd_ops": [ > "write 0~4194304 [fadvise_dontneed] in=4194304b" > ] > }, > { > "tid": 9532806, > "pg": "3.abba2e66", > "osd": 2, > "object_id": "200.02c7a085", > "object_locator": "@3", > "target_object_id": "200.02c7a085", > "target_object_locator": "@3", > "paused": 0, > "used_replica": 0, > "precalc_pgid": 0, > "last_sent": "1121127.438264s", > "age": 0.012781, > "attempts": 1, > "snapid": "head", > "snap_context": "0=[]", > "mtime": "2021-12-10T08:35:34.589044+", > "osd_ops": [ > "write 0~1236893 [fadvise_dontneed] in=1236893b" > ] > }, > { > "tid": 9532807, > "pg": "3.abba2e66", > "osd": 2, > "object_id": "200.02c7a085", > "object_locator": "@3", > "target_object_id": "200.02c7a085", > "target_object_locator": "@3", > "paused": 0, > "used_replica": 0, > "precalc_pgid": 0, > "last_sent": "1121127.442264s", > "age": 0.0085206, > "attempts": 1, > "snapid": "head", > "snap_context": "0=[]", > "mtime": "2021-12-10T08:35:34.592283+", > "osd_ops": [ > "write 1236893~510649 [fadvise_dontneed] in=510649b" > ] > }, > { > "tid": 9532808, > "pg": "3.abba2e66", > "osd": 2, > "object_id": "200.02c7a085", > "object_locator": "@3", > "target_object_id": "200.02c7a085", > "target_object_locator": "@3", >
[ceph-users] Re: RBD mirroring bootstrap peers - direction
On 12/15/21 14:18, Arthur Outhenin-Chalandre wrote: On 12/15/21 13:50, Torkil Svensgaard wrote: Ah, so as long as I don't run the mirror daemons on site-a there is no risk of overwriting production data there? To be perfectly clear there should be no risk whatsoever (as Ilya also said). I suggested to not run rbd-mirror on site-a so that replication from site-b to site-a wouldn't be a thing at all. That being said we also run a setup where we only need one way replication but for the same reasons posted by Ilya we use rx-tx and run rbd-mirror in both sites. Hi Arthur Thanks for the clarification. I set up one peer with rx-tx, and it seems to replicate as it should, but the site-a status looks a little odd. Why down+unknown and status not found? Because of rx-tx peer with only one way active? site-a: rbd mirror image status rbd_internal/store athos: Thu Dec 16 07:51:26 2021 store: global_id: 4888eab6-f6f4-4a11-9e91-e3446651f911 state: down+unknown description: status not found last_update: peer_sites: name: dcn-ceph state: up+replaying description: replaying, {"bytes_per_second":272384136.53,"bytes_per_snapshot":0.0,"remote_snapshot_timestamp":16395851 29,"replay_state":"syncing","seconds_until_synced":0,"syncing_percent":12,"syncing_snapshot_timestamp":1639585129} last_update: 2021-12-16 07:51:04 snapshots: 20 .mirror.primary.4888eab6-f6f4-4a11-9e91-e3446651f911.137a0eeb-c54d-4f0f-9cb1-3cdc87a891d4 (peer_uuids:[bf1978c0-423 1-43ca-831f-669bf4a898b2]) site-b: rbd mirror image status rbd_internal/store dcn-ceph-01: Thu Dec 16 06:53:46 2021 store: global_id: 4888eab6-f6f4-4a11-9e91-e3446651f911 state: up+replaying description: replaying, {"bytes_per_second":252701491.2,"bytes_per_snapshot":0.0,"remote_snapshot_timestamp":1639585129,"replay_s tate":"syncing","seconds_until_synced":0,"syncing_percent":13,"syncing_snapshot_timestamp":1639585129} service: dcn-ceph-01.itashe on dcn-ceph-01 last_update: 2021-12-16 06:53:34 Here's one with rx only. That looks more peaceful: site-a: rbd mirror image status rbd/mail athos: Thu Dec 16 07:58:10 2021 mail: global_id: 2b3d355c-d095-45a4-8c29-80f059d78483 snapshots: 116 .mirror.primary.2b3d355c-d095-45a4-8c29-80f059d78483.82d2240a-319f-4990-84e8-98284864032c (peer_uuids:[cabf78ce-f6 5f-4a27-a648-20b3fd326647]) site-b: rbd mirror image status rbd/mail dcn-ceph-01: Thu Dec 16 06:57:22 2021 mail: global_id: 2b3d355c-d095-45a4-8c29-80f059d78483 state: up+replaying description: replaying, {"bytes_per_second":0.0,"bytes_per_snapshot":1140323336192.0,"local_snapshot_timestamp":1639573987,"remot e_snapshot_timestamp":1639573987,"replay_state":"idle"} service: dcn-ceph-01.itashe on dcn-ceph-01 last_update: 2021-12-16 06:57:07 Mvh. Torkil -- Torkil Svensgaard Sysadmin MR-Forskningssektionen, afs. 714 DRCMR, Danish Research Centre for Magnetic Resonance Hvidovre Hospital Kettegård Allé 30 DK-2650 Hvidovre Denmark Tel: +45 386 22828 E-mail: tor...@drcmr.dk ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io