[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster

2021-12-15 Thread Michael Uleysky
Thanks!

As far as I can see, this is the same problem as mine.

ср, 15 дек. 2021 г. в 16:49, Chris Dunlop :

> On Wed, Dec 15, 2021 at 02:05:05PM +1000, Michael Uleysky wrote:
> > I try to upgrade three-node nautilus cluster to pacific. I am updating
> ceph
> > on one node and restarting daemons. OSD ok, but monitor cannot enter
> quorum.
>
> Sounds like the same thing as:
>
> Pacific mon won't join Octopus mons
> https://tracker.ceph.com/issues/52488
>
> Unforutunately there's no resolution.
>
> For a bit more background, see also the thread starting:
>
> New pacific mon won't join with octopus mons
> https://www.spinics.net/lists/ceph-devel/msg52181.html
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to clean up data in OSDs

2021-12-15 Thread Nagaraj Akkina
Hello Team,

After testing our cluster we removed and recreated all  ceph pools which
actually cleaned up all users and buckets, but we can still see data in the
disks.
is there a easy way to clean up all osds without actually removing and
reconfiguring them?
what can be the best way to solve this problem? currently we are
experiencing RGW demon crashes as rados still try to look in to old buckets.

Any help is much appreciated.

Regards,
Akkina
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] what does "Message has implicit destination" mean

2021-12-15 Thread Marc




The message is being held because:

Message has implicit destination
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Snapshot mirroring problem

2021-12-15 Thread Torkil Svensgaard

Hi

I'm having trouble getting snapshot replication to work. I have 2 
clusters, 714-ceph on RHEL/16.2.0-146.el8cp and dcn-ceph on CentOS 
Stream 8/16.2.6. I trying to enable one-way replication from 714-ceph -> 
dcn-ceph.


Adding peer:

"
# rbd mirror pool info
Mode: image
Site Name: dcn-ceph

Peer Sites: none

# rbd --cluster dcn-ceph mirror pool peer bootstrap import --direction 
rx-only --site-name dcn-ceph rbd /tmp/token
2021-12-15T08:24:20.250+ 7fa8b498d2c0 -1 auth: unable to find a 
keyring on 
/etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: 
(2) No such file or directory
2021-12-15T08:24:20.251+ 7fa8b498d2c0 -1 auth: unable to find a 
keyring on 
/etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: 
(2) No such file or directory
2021-12-15T08:24:20.251+ 7fa8b498d2c0 -1 auth: unable to find a 
keyring on 
/etc/ceph/..keyring,/etc/ceph/.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: 
(2) No such file or directory

# dcn-ceph-01/root tmp
# rbd mirror pool info
Mode: image
Site Name: dcn-ceph

Peer Sites:

UUID: cd68b1bb-3e0b-4f9d-bd52-4ff5804c9746
Name: 714-ceph
Direction: rx-only
Client: client.rbd-mirror-peer
"

I presume the error is benign, as pr 
https://bugzilla.redhat.com/show_bug.cgi?id=1981186, and the peer 
relation seems to be established.


On 714-ceph:

"
# rbd mirror pool info
Mode: image
Site Name: 714-ceph

Peer Sites:

UUID: cabf78ce-f65f-4a27-a648-20b3fd326647
Name: dcn-ceph
Mirror UUID: 4132f9e2-555f-4363-8c62-72f0db37f700
Direction: tx-only

# rbd mirror pool info rbd
Mode: image
Site Name: 714-ceph

Peer Sites:

UUID: cabf78ce-f65f-4a27-a648-20b3fd326647
Name: dcn-ceph
Mirror UUID: 4132f9e2-555f-4363-8c62-72f0db37f700
Direction: tx-only

rbd mirror image enable rbd/rbdmirrortest snapshot

# rbd info rbd/rbdmirrortest
rbd image 'rbdmirrortest':
size 100 GiB in 25600 objects
order 22 (4 MiB objects)
snapshot_count: 1
id: 46e06591893e12
block_name_prefix: rbd_data.46e06591893e12
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
op_features:
flags:
create_timestamp: Tue Dec 14 08:46:59 2021
access_timestamp: Tue Dec 14 10:30:01 2021
modify_timestamp: Tue Dec 14 08:46:59 2021
mirroring state: enabled
mirroring mode: snapshot
mirroring global id: 3c92991f-b0ae-496f-adb6-f2f25cbb2220
mirroring primary: true

# rbd mirror image status rbd/rbdmirrortest
rbdmirrortest:
  global_id:   3c92991f-b0ae-496f-adb6-f2f25cbb2220
  snapshots:
109 
.mirror.primary.3c92991f-b0ae-496f-adb6-f2f25cbb2220.6d267de5-e9db-4dfd-b626-a4df87ad2485 
(peer_uuids:[cabf78ce-f65f-4a27-a648-20b3fd326647])

"

Looks good?

At the dcn-ceph end the image is created, but then nothing happens:

"
# rbd info rbd/rbdmirrortest
rbd image 'rbdmirrortest':
size 100 GiB in 25600 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 8e3863f675783d
data_pool: rbd_data
block_name_prefix: rbd_data.4.8e3863f675783d
format: 2
	features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, data-pool, non-primary

op_features:
flags:
create_timestamp: Wed Dec 15 08:25:20 2021
access_timestamp: Wed Dec 15 08:25:20 2021
modify_timestamp: Wed Dec 15 08:25:20 2021
mirroring state: unknown
mirroring mode: snapshot
mirroring global id: 3c92991f-b0ae-496f-adb6-f2f25cbb2220
mirroring primary: false

# rbd mirror image status rbd/rbdmirrortest
rbd: mirroring not enabled on the image
"

Any ideas?

Thanks,

Torkil



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snapshot mirroring problem

2021-12-15 Thread Arthur Outhenin-Chalandre
Hi Torkil,

On 12/15/21 09:45, Torkil Svensgaard wrote:
> I'm having trouble getting snapshot replication to work. I have 2 
> clusters, 714-ceph on RHEL/16.2.0-146.el8cp and dcn-ceph on CentOS 
> Stream 8/16.2.6. I trying to enable one-way replication from 714-ceph -> 
> dcn-ceph.

I didn't try the one way replication myself with the snapshot mode so I
can't say for sure, but there is an issue in 16.2.6 [1]. It has been
fixed and backported into 16.2.7, an update to that version may solve
your problem!

[1]: https://tracker.ceph.com/issues/52675

Cheers,

-- 
Arthur Outhenin-Chalandre
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MAX AVAIL capacity mismatch || mimic(13.2)

2021-12-15 Thread Md. Hejbul Tawhid MUNNA
Hi,

Our total number of hdd-OSD is 40. 40X5.5TB=220. we are using 3 replica for
every pool. So,  "Max avail" should show 220/3= 73.3. Am I right?

what is the meaning of "variance 1.x". I think we might have wrong
configuration , but need to find it.

We have some more SSD-OSD, , yeah total capacity is showing by calculating
hdd+ssd. but pool wise max  available should difference.


# ceph osd df
ID CLASS WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE  VAR  PGS
 0   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.74 1.35 871
 1   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 34.25 1.22 840
 2   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 31.53 1.13 831
 3   hdd 5.57100  1.0 5.6 TiB 2.2 TiB 3.4 TiB 38.80 1.39 888
 4   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.22 1.19 866
 5   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.6 TiB 36.12 1.29 837
 6   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.12 1.15 858
 7   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 29.63 1.06 851
 8   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.57 1.20 799
 9   hdd 5.57100  1.0 5.6 TiB 1.6 TiB 4.0 TiB 28.73 1.03 793
10   hdd 5.57100  1.0 5.6 TiB 1.6 TiB 3.9 TiB 29.51 1.05 839
11   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.6 TiB 36.19 1.29 860
12   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.61 1.20 904
13   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.52 1.16 807
14   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 34.17 1.22 845
15   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.5 TiB 37.61 1.34 836
16   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.12 1.11 881
17   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.66 1.17 876
18   hdd 5.57100  1.0 5.6 TiB 2.4 TiB 3.2 TiB 42.29 1.51 860
19   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 29.93 1.07 828
20   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.6 TiB 34.65 1.24 854
21   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.62 1.20 845
22   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.21 1.19 797
23   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.75 1.31 839
24   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.5 TiB 36.98 1.32 829
25   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 30.86 1.10 878
26   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.68 1.31 867
27   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.13 1.11 842
28   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 32.12 1.15 821
29   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.44 1.19 871
30   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.6 TiB 35.97 1.29 813
31   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.9 TiB 30.60 1.09 812
32   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.6 TiB 34.65 1.24 836
33   hdd 5.57100  1.0 5.6 TiB 1.8 TiB 3.8 TiB 31.57 1.13 884
34   hdd 5.57100  1.0 5.6 TiB 2.0 TiB 3.5 TiB 36.67 1.31 829
35   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.6 TiB 34.79 1.24 900
36   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.76 1.21 838
37   hdd 5.57100  1.0 5.6 TiB 2.1 TiB 3.4 TiB 38.21 1.37 796
38   hdd 5.57100  1.0 5.6 TiB 1.7 TiB 3.8 TiB 31.26 1.12 841
39   hdd 5.57100  1.0 5.6 TiB 1.9 TiB 3.7 TiB 33.76 1.21 830
40   ssd 1.81898  1.0 1.8 TiB  22 GiB 1.8 TiB  1.18 0.04 112
42   ssd 1.81879  1.0 1.8 TiB  21 GiB 1.8 TiB  1.12 0.04 107
43   ssd 1.81879  1.0 1.8 TiB  24 GiB 1.8 TiB  1.27 0.05 121
44   ssd 1.81879  1.0 1.8 TiB  20 GiB 1.8 TiB  1.06 0.04 101
45   ssd 1.81879  1.0 1.8 TiB  23 GiB 1.8 TiB  1.24 0.04 116
46   ssd 1.81879  1.0 1.8 TiB  24 GiB 1.8 TiB  1.27 0.05 120
47   ssd 1.81879  1.0 1.8 TiB  22 GiB 1.8 TiB  1.17 0.04 110
48   ssd 1.81879  1.0 1.8 TiB  23 GiB 1.8 TiB  1.26 0.04 120
49   ssd 1.81879  1.0 1.8 TiB  23 GiB 1.8 TiB  1.21 0.04 117
41   ssd 1.81898  1.0 1.8 TiB  18 GiB 1.8 TiB  0.97 0.03  94
50   ssd 1.81940  1.0 1.8 TiB  22 GiB 1.8 TiB  1.19 0.04 115
51   ssd 1.81940  1.0 1.8 TiB  19 GiB 1.8 TiB  1.03 0.04  98
52   ssd 1.81940  1.0 1.8 TiB  22 GiB 1.8 TiB  1.16 0.04 109
53   ssd 1.81940  1.0 1.8 TiB  21 GiB 1.8 TiB  1.13 0.04 105
54   ssd 1.81940  1.0 1.8 TiB  25 GiB 1.8 TiB  1.36 0.05 128
55   ssd 1.81940  1.0 1.8 TiB  22 GiB 1.8 TiB  1.19 0.04 113
56   ssd 1.81940  1.0 1.8 TiB  27 GiB 1.8 TiB  1.43 0.05 140
57   ssd 1.81940  1.0 1.8 TiB  24 GiB 1.8 TiB  1.29 0.05 122
58   ssd 1.81940  1.0 1.8 TiB  21 GiB 1.8 TiB  1.13 0.04 107
59   ssd 1.81940  1.0 1.8 TiB  21 GiB 1.8 TiB  1.12 0.04 111
60   ssd 1.81940  1.0 1.8 TiB  27 GiB 1.8 TiB  1.45 0.05 137
61   ssd 1.81940  1.0 1.8 TiB  23 GiB 1.8 TiB  1.24 0.04 117
62   ssd 1.81940  1.0 1.8 TiB  22 GiB 1.8 TiB  1.16 0.04 112
63   ssd 1.81940  1.0 1.8 TiB  25 GiB 1.8 TiB  1.32 0.05 126
64   ssd 1.81940  1.0 1.8 TiB  23 GiB 1.8 TiB  1.23 0.04 115
65   ssd 1.81940  1.0 1.8 TiB  20 GiB 1.8 TiB  1.07 0.04  99
66   ssd 1.81940  1.0 1.8 TiB  19 GiB 1.8 TiB  1.03 0.04 100
TOTAL 272 TiB  76 TiB 196 TiB 27.99


# ceph df
GLOBAL:
SIZEAVAIL   RAW USED %RAW USED
272 TiB 196 TiB   76 TiB 28.02

[ceph-users] RBD mirroring bootstrap peers - direction

2021-12-15 Thread Torkil Svensgaard

Hi

I'm confused by the direction parameter in the documentation[1]. If I 
have my data at site-a and want one way replication to site-b should the 
mirroring be configured as the documentation example, directionwise?


E.g.

rbd --cluster site-a mirror pool peer bootstrap create --site-name 
site-a image-pool (get token)


rbd --cluster site-b mirror pool peer bootstrap import --site-name 
site-b --direction rx-only image-pool token


Mvh.

Torkil

[1] https://docs.ceph.com/en/latest/rbd/rbd-mirroring/#bootstrap-peers
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: what does "Message has implicit destination" mean

2021-12-15 Thread Janne Johansson
Den ons 15 dec. 2021 kl 09:35 skrev Marc :
> The message is being held because:
>
> Message has implicit destination

Usually stuff like "the maillist wasn't in the To: field, but only CC: or BCC:"

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD mirroring bootstrap peers - direction

2021-12-15 Thread Arthur Outhenin-Chalandre
Hi Torkil,

On 12/15/21 13:24, Torkil Svensgaard wrote:
> I'm confused by the direction parameter in the documentation[1]. If I 
> have my data at site-a and want one way replication to site-b should the 
> mirroring be configured as the documentation example, directionwise?

What you are describing seems at first glance right. The rbd-mirror
daemon semantic is to replicate data from a remote cluster to the local
cluster. But I am not sure, I use rx-tx everywhere...

Also the default rx-tx would probably also work in your case as long as
you don't try to run rbd-mirror on site-a.

Cheers,

-- 
Arthur Outhenin-Chalandre
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD mirroring bootstrap peers - direction

2021-12-15 Thread Torkil Svensgaard

On 15/12/2021 13.44, Arthur Outhenin-Chalandre wrote:

Hi Torkil,


Hi Arthur


On 12/15/21 13:24, Torkil Svensgaard wrote:

I'm confused by the direction parameter in the documentation[1]. If I
have my data at site-a and want one way replication to site-b should the
mirroring be configured as the documentation example, directionwise?


What you are describing seems at first glance right. The rbd-mirror
daemon semantic is to replicate data from a remote cluster to the local
cluster. But I am not sure, I use rx-tx everywhere...

Also the default rx-tx would probably also work in your case as long as
you don't try to run rbd-mirror on site-a.


Ah, so as long as I don't run the mirror daemons on site-a there is no 
risk of overwriting production data there?


I'm upgrading to 16.2.7 as you suggested in the other thread[1]. If that 
doesn't fix the issue I found another thread[2] suggesting the direction 
should be reversed, but that sounded a bit scary.


Thanks,

Torkil

[1] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MGG7X5ITC4XA3JREAWU74DDEZTWLSSZE/
[2] 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/NOFX6TXZ7WRUV2ZSTI4N6EP73YN6JKQQ/



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD mirroring bootstrap peers - direction

2021-12-15 Thread Ilya Dryomov
Hi Torkil,

I would recommend sticking to rx-tx to make potential failback back to
the primary cluster easier.  There shouldn't be any issue with running
rbd-mirror daemons at both sites either -- it doesn't start replicating
until it is instructed to, either per-pool or per-image.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Snapshot mirroring problem

2021-12-15 Thread Torkil Svensgaard



On 15/12/2021 10.17, Arthur Outhenin-Chalandre wrote:

Hi Torkil,


Hi Arthur


On 12/15/21 09:45, Torkil Svensgaard wrote:

I'm having trouble getting snapshot replication to work. I have 2
clusters, 714-ceph on RHEL/16.2.0-146.el8cp and dcn-ceph on CentOS
Stream 8/16.2.6. I trying to enable one-way replication from 714-ceph ->
dcn-ceph.


I didn't try the one way replication myself with the snapshot mode so I
can't say for sure, but there is an issue in 16.2.6 [1]. It has been
fixed and backported into 16.2.7, an update to that version may solve
your problem!


Thanks, that did the trick =)

Mvh.

Torkil


[1]: https://tracker.ceph.com/issues/52675

Cheers,


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD mirroring bootstrap peers - direction

2021-12-15 Thread Torkil Svensgaard

On 15/12/2021 13.58, Ilya Dryomov wrote:

Hi Torkil,


Hi Ilya


I would recommend sticking to rx-tx to make potential failback back to
the primary cluster easier.  There shouldn't be any issue with running
rbd-mirror daemons at both sites either -- it doesn't start replicating
until it is instructed to, either per-pool or per-image.


Thanks for the clarification.

Mvh.

Torkil
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD mirroring bootstrap peers - direction

2021-12-15 Thread Arthur Outhenin-Chalandre
On 12/15/21 13:50, Torkil Svensgaard wrote:
> Ah, so as long as I don't run the mirror daemons on site-a there is no 
> risk of overwriting production data there?

To be perfectly clear there should be no risk whatsoever (as Ilya also
said). I suggested to not run rbd-mirror on site-a so that replication
from site-b to site-a wouldn't be a thing at all.

That being said we also run a setup where we only need one way
replication but for the same reasons posted by Ilya we use rx-tx and run
rbd-mirror in both sites.

Cheers,

-- 
Arthur Outhenin-Chalandre
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster

2021-12-15 Thread Gregory Farnum
Hmm that ticket came from the slightly unusual scenario where you were
deploying a *new* Pacific monitor against an Octopus cluster.

Michael, is your cluster deployed with cephadm? And is this a new or
previously-existing monitor?

On Wed, Dec 15, 2021 at 12:09 AM Michael Uleysky  wrote:
>
> Thanks!
>
> As far as I can see, this is the same problem as mine.
>
> ср, 15 дек. 2021 г. в 16:49, Chris Dunlop :
>
> > On Wed, Dec 15, 2021 at 02:05:05PM +1000, Michael Uleysky wrote:
> > > I try to upgrade three-node nautilus cluster to pacific. I am updating
> > ceph
> > > on one node and restarting daemons. OSD ok, but monitor cannot enter
> > quorum.
> >
> > Sounds like the same thing as:
> >
> > Pacific mon won't join Octopus mons
> > https://tracker.ceph.com/issues/52488
> >
> > Unforutunately there's no resolution.
> >
> > For a bit more background, see also the thread starting:
> >
> > New pacific mon won't join with octopus mons
> > https://www.spinics.net/lists/ceph-devel/msg52181.html
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Large latency for single thread

2021-12-15 Thread norman.kern
I create a rbd pool using only two SATA SSDs(one for data, another for 
database,WAL), and set the replica size 1.


After that, I setup a fio test on Host same with the OSD placed. I found 
the latency is hundreds micro-seconds(sixty micro-seconds for the raw 
SATA SSD ).


The fio outpus:

m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15 
14:05:32 2021
  write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone 
resets

    slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
    clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
 lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
    clat percentiles (usec):
 |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824], 20.00th=[  906],
 | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[ 1303],
 | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[ 1663],
 | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[ 3949],
 | 99.99th=[ 6718]
   bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54, 
stdev=588.79, samples=360
   iops    : min=  482, max= 1262, avg=794.76, stdev=147.20, 
samples=360

  lat (usec)   : 750=2.98%, 1000=22.41%
  lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
  cpu  : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
>=64=0.0%
 submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%
 complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
>=64=0.0%

 issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
 latency   : target=0, window=0, percentile=100.00%, depth=1


Parts of the OSD' perf status:

 "state_io_done_lat": {
    "avgcount": 151295,
    "sum": 0.336297058,
    "avgtime": 0.0
    },
    "state_kv_queued_lat": {
    "avgcount": 151295,
    "sum": 18.812333051,
    "avgtime": 0.000124342
    },
    "state_kv_commiting_lat": {
    "avgcount": 151295,
    "sum": 64.555436175,
    "avgtime": 0.000426685
    },
    "state_kv_done_lat": {
    "avgcount": 151295,
    "sum": 0.130403628,
    "avgtime": 0.00861
    },
    "state_deferred_queued_lat": {
    "avgcount": 148,
    "sum": 215.726286547,
    "avgtime": 1.457610044
    },

... ...

    "op_w_latency": {
    "avgcount": 151133,
    "sum": 130.134246667,
    "avgtime": 0.000861057
    },
    "op_w_process_latency": {
    "avgcount": 151133,
    "sum": 125.301196872,
    "avgtime": 0.000829079
    },
    "op_w_prepare_latency": {
    "avgcount": 151133,
    "sum": 29.892687947,
    "avgtime": 0.000197790
    },

Is it reasonable for the benchmark test case?  And how to improve it?  
It's really NOT friendly for single thread.



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph-mon pacific doesn't enter to quorum of nautilus cluster

2021-12-15 Thread Robert Sander

On 15.12.21 05:59, Linh Vu wrote:

May not be directly related to your error, but they slap a DO NOT UPGRADE
FROM AN OLDER VERSION label on the Pacific release notes for a reason...

https://docs.ceph.com/en/latest/releases/pacific/


This is an unrelated issue (bluestore_fsck_quick_fix_on_mount) that has 
been fixed with 16.2.7. This page should be updated.


The 16.2.7 release is currently not in the release index. Is there a 
reason for that?


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large latency for single thread

2021-12-15 Thread Marc
Is this not just inherent to SDS? And wait for the new osd code, I think they 
are working on it.

https://yourcmc.ru/wiki/Ceph_performance


> 
> m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15
> 14:05:32 2021
>    write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone
> resets
>      slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
>      clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
>   lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
>      clat percentiles (usec):
>   |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824],
> 20.00th=[  906],
>   | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[
> 1303],
>   | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[
> 1663],
>   | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[
> 3949],
>   | 99.99th=[ 6718]
>     bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54,
> stdev=588.79, samples=360
>     iops    : min=  482, max= 1262, avg=794.76, stdev=147.20,
> samples=360
>    lat (usec)   : 750=2.98%, 1000=22.41%
>    lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
>    cpu  : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
>    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>  >=64=0.0%
>   submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>  >=64=0.0%
>   issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
>   latency   : target=0, window=0, percentile=100.00%, depth=1
> 
> 
> Parts of the OSD' perf status:
> 
>   "state_io_done_lat": {
>      "avgcount": 151295,
>      "sum": 0.336297058,
>      "avgtime": 0.0
>      },
>      "state_kv_queued_lat": {
>      "avgcount": 151295,
>      "sum": 18.812333051,
>      "avgtime": 0.000124342
>      },
>      "state_kv_commiting_lat": {
>      "avgcount": 151295,
>      "sum": 64.555436175,
>      "avgtime": 0.000426685
>      },
>      "state_kv_done_lat": {
>      "avgcount": 151295,
>      "sum": 0.130403628,
>      "avgtime": 0.00861
>      },
>      "state_deferred_queued_lat": {
>      "avgcount": 148,
>      "sum": 215.726286547,
>      "avgtime": 1.457610044
>      },
> 
> ... ...
> 
>      "op_w_latency": {
>      "avgcount": 151133,
>      "sum": 130.134246667,
>      "avgtime": 0.000861057
>      },
>      "op_w_process_latency": {
>      "avgcount": 151133,
>      "sum": 125.301196872,
>      "avgtime": 0.000829079
>      },
>      "op_w_prepare_latency": {
>      "avgcount": 151133,
>      "sum": 29.892687947,
>      "avgtime": 0.000197790
>      },
> 
> Is it reasonable for the benchmark test case?  And how to improve it?
> It's really NOT friendly for single thread.
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large latency for single thread

2021-12-15 Thread Mark Nelson
FWIW, we ran single OSD, iodepth=1 O_DSYNC write tests against classic 
and crimson bluestore OSDs in our Q3 crimson slide deck. You can see the 
results starting on slide 32 here:



https://docs.google.com/presentation/d/1eydyAFKRea8n-VniQzXKW8qkKM9GLVMJt2uDjipJjQA/edit#slide=id.gf880cf6296_1_73


That was with the OSD restricted to 2 cores, but for these tests it 
shouldn't really matter.  Also keep in mind that the fio client was on 
localhost as well.  Note that Crimson is less efficient than the classic 
OSD in this test (while being more efficient in other tests) because the 
reactor is working in a tight loop to reduce latency and since the OSD 
isn't doing a ton of IO that ends up dominating in terms of CPU usage.  
Seastar provides an option to have the reactor be a bit more lazy that 
lowers idle CPU consumption but we don't utilize it yet.



Running with replication across mulitple OSDs (that requires round trips 
to mulitple replicas) does make this tougher to do well on a real 
cluster.  I suspect that long term crimson should be better at this kind 
of workload vs classic, but with synchronous replication we're always 
going to be fighting against the slowest link.



Mark

On 12/15/21 12:44 PM, Marc wrote:

Is this not just inherent to SDS? And wait for the new osd code, I think they 
are working on it.

https://yourcmc.ru/wiki/Ceph_performance



m-seqwr-004k-001q-001j: (groupid=0, jobs=1): err= 0: pid=46: Wed Dec 15
14:05:32 2021
    write: IOPS=794, BW=3177KiB/s (3254kB/s)(559MiB/180002msec); 0 zone
resets
      slat (usec): min=4, max=123, avg=22.30, stdev= 9.18
      clat (usec): min=630, max=16977, avg=1232.89, stdev=354.67
   lat (usec): min=639, max=17009, avg=1255.19, stdev=358.99
      clat percentiles (usec):
   |  1.00th=[  709],  5.00th=[  775], 10.00th=[  824],
20.00th=[  906],
   | 30.00th=[ 1074], 40.00th=[ 1172], 50.00th=[ 1237], 60.00th=[
1303],
   | 70.00th=[ 1369], 80.00th=[ 1450], 90.00th=[ 1565], 95.00th=[
1663],
   | 99.00th=[ 2606], 99.50th=[ 3261], 99.90th=[ 3785], 99.95th=[
3949],
   | 99.99th=[ 6718]
     bw (  KiB/s): min= 1928, max= 5048, per=100.00%, avg=3179.54,
stdev=588.79, samples=360
     iops    : min=  482, max= 1262, avg=794.76, stdev=147.20,
samples=360
    lat (usec)   : 750=2.98%, 1000=22.41%
    lat (msec)   : 2=73.38%, 4=1.18%, 10=0.04%, 20=0.01%
    cpu  : usr=2.69%, sys=1.78%, ctx=145218, majf=0, minf=2
    IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
  >=64=0.0%
   submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
   complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
  >=64=0.0%
   issued rwts: total=0,142985,0,0 short=0,0,0,0 dropped=0,0,0,0
   latency   : target=0, window=0, percentile=100.00%, depth=1


Parts of the OSD' perf status:

   "state_io_done_lat": {
      "avgcount": 151295,
      "sum": 0.336297058,
      "avgtime": 0.0
      },
      "state_kv_queued_lat": {
      "avgcount": 151295,
      "sum": 18.812333051,
      "avgtime": 0.000124342
      },
      "state_kv_commiting_lat": {
      "avgcount": 151295,
      "sum": 64.555436175,
      "avgtime": 0.000426685
      },
      "state_kv_done_lat": {
      "avgcount": 151295,
      "sum": 0.130403628,
      "avgtime": 0.00861
      },
      "state_deferred_queued_lat": {
      "avgcount": 148,
      "sum": 215.726286547,
      "avgtime": 1.457610044
      },

... ...

      "op_w_latency": {
      "avgcount": 151133,
      "sum": 130.134246667,
      "avgtime": 0.000861057
      },
      "op_w_process_latency": {
      "avgcount": 151133,
      "sum": 125.301196872,
      "avgtime": 0.000829079
      },
      "op_w_prepare_latency": {
      "avgcount": 151133,
      "sum": 29.892687947,
      "avgtime": 0.000197790
      },

Is it reasonable for the benchmark test case?  And how to improve it?
It's really NOT friendly for single thread.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?

2021-12-15 Thread Marco Pizzolo
Thanks Linh Vu, so it sounds like i should be prepared to bounce the OSDs
and/or Hosts, but I haven't heard anyone yet say that it won't work, so I
guess there's that...

On Tue, Dec 14, 2021 at 7:48 PM Linh Vu  wrote:

> I haven't tested this in Nautilus 14.2.22 (or any nautilus) but in
> Luminous or older, if you go from a bigger size to a smaller size, there
> was either a bug or a "feature-not-bug" that didn't allow the OSDs to
> automatically purge the redundant PGs with data copies. I did this on a
> size=5 to size=3 situation in a 1000+ OSD cluster, and also just recently
> in a test Luminous cluster (size=3 to size=2). In order for the purge to
> actually happen, I had to restart every OSD (one at a time for safety, or
> just run ceph-ansible site.yml with the osd handler health check = true).
>
> On Wed, Dec 15, 2021 at 8:58 AM Marco Pizzolo 
> wrote:
>
>> Hi Martin,
>>
>> Agreed on the min_size of 2.  I have no intention of worrying about uptime
>> in event of a host failure.  Once size of 2 is effectuated (and I'm unsure
>> how long it will take), it is our intention to evacuate all OSDs in one of
>> 4 hosts, in order to migrate the host to the new cluster, where its OSDs
>> will then be added in.  Once added and balanced, we will complete the
>> copies (<3 days) and then migrate one more host allowing us to bring size
>> to 3.  Once balanced, we will collapse the last 2 nodes into the new
>> cluster.  I am hoping that inclusive of rebalancing the whole project will
>> only take 3 weeks, but time will tell.
>>
>> Has anyone asked Ceph to reduce hundreds of millions if not billions of
>> files from size 3 to size 2, and if so, were you successful?  I know it
>> *should* be able to do this, but sometimes theory and practice don't
>> perfectly overlap.
>>
>> Thanks,
>> Marco
>>
>> On Sat, Dec 11, 2021 at 4:37 AM Martin Verges 
>> wrote:
>>
>> > Hello,
>> >
>> > avoid size 2 whenever you can. As long as you know that you might lose
>> > data, it can be an acceptable risk while migrating the cluster. We had
>> that
>> > in the past multiple time and it is a valid use case in our opinion.
>> > However make sure to monitor the state and recover as fast as possible.
>> > Leave min_size on 2 as well and accept the potential downtime!
>> >
>> > --
>> > Martin Verges
>> > Managing director
>> >
>> > Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>> >
>> > croit GmbH, Freseniusstr. 31h, 81247 Munich
>> > CEO: Martin Verges - VAT-ID: DE310638492
>> > Com. register: Amtsgericht Munich HRB 231263
>> > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>> >
>> >
>> > On Fri, 10 Dec 2021 at 18:05, Marco Pizzolo 
>> > wrote:
>> >
>> >> Hello,
>> >>
>> >> As part of a migration process where we will be swinging Ceph hosts
>> from
>> >> one cluster to another we need to reduce the size from 3 to 2 in order
>> to
>> >> shrink the footprint sufficiently to allow safe removal of an OSD/Mon
>> >> node.
>> >>
>> >> The cluster has about 500M objects as per dashboard, and is about
>> 1.5PB in
>> >> size comprised solely of small files served through CephFS to Samba.
>> >>
>> >> Has anyone encountered a similar situation?  What (if any) problems did
>> >> you
>> >> face?
>> >>
>> >> Ceph 14.2.22 bare metal deployment on Centos.
>> >>
>> >> Thanks in advance.
>> >>
>> >> Marco
>> >> ___
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] NFS-ganesha .debs not on download.ceph.com

2021-12-15 Thread Richard Zak
I've got Ceph running on Ubuntu 20.04 using Ceph-ansible, and I noticed
that the .deb files for NFS-ganesha aren't on download.ceph.com. It seems
the files should be here:
https://download.ceph.com/nfs-ganesha/deb-V3.5-stable/pacific but
"deb-V3.5-stable" doesn't exist. Poking around, I can see there's no debian
repo for NFA-ganesha for pacific. Is this an error, or should ceph-ansible
be configured to look elsewhere for the repo?

-- 
Regards,

Richard J. Zak
Professional Genius
PGP Key: https://keybase.io/rjzak/key.asc
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS Metadata Pool bandwidth usage

2021-12-15 Thread Andras Sali
Hi Xiubo,

Thanks very much for looking into this, that does sound like what might
be happening in our case.

Is this something that can be improved somehow - would disabling pinning or
some config change help? Or could this be addressed in a future release?

It seems somehow excessive to write so much metadata for each operation, it
makes metadata operations constrained by the disk capacity. We were happy
to use pinning as it's very natural for us and seems to make the FS more
stable, however the metadata bandwidth usage is becoming a real issue.

Thanks very much for your help,

Kind regards,

András





On Thu, Dec 16, 2021, 03:43 Xiubo Li  wrote:

> I have figured out one case may could cause this, please see the tracker
> https://tracker.ceph.com/issues/53623.
>
> Just in case that there has a large number of subtrees in mds.0 and the
> size of ESubtreeMap event could reach up to 4MB. Then it's possibly that
> for each LogSegment will only contain 2 event, that means when the an MDS
> is submitting each new event, it's possibly will use a whole LogSegment.
>
> I think the most of handwidth usage could be writing the ESubtreeMap
> events to metadata pool.
>
> -- Xiubo
>
>
>
>
> On 12/13/21 9:52 PM, Gregory Farnum wrote:
>
> I see Xiubo started discussing this onhttps://tracker.ceph.com/issues/53542 
> as well.
>
> So the large writes are going to the journal file, and sometimes it's
> s single write of a full segment size, which is what I was curious
> about.
>
> At this point the next step is seeing what is actually taking up that
> space. You could turn up logging and send in a snippet, but I think
> the easiest thing is going to involve:
> * track one of those 4 MB full-object writes
> Either a) pull the object in question off disk and look at it using
> ceph-dencoder, or
> b) Use cephfs-journal-tool to inspect the relevant journal range
>
> From your output below you could grab 200.02c7a084 (which is at
> journal offset 0x2c7a084*4MiB), though that's probably been expired by
> this point so you'll need to get another dump which contains a large
> one. I haven't looked at these data structures using these tools in a
> while so I'll leave more detail up to Xiubo.
> -Greg
>
> On Fri, Dec 10, 2021 at 12:48 AM Andras Sali  
>  wrote:
>
> Hi Greg,
>
> As a follow up, we see items similar to this pop up in the objecter_requests 
> (when it's not empty). Not sure if reading it right, but some appear quite 
> large (in the MB range?):
>
> {
> "ops": [
> {
> "tid": 9532804,
> "pg": "3.f9c235d7",
> "osd": 2,
> "object_id": "200.02c7a084",
> "object_locator": "@3",
> "target_object_id": "200.02c7a084",
> "target_object_locator": "@3",
> "paused": 0,
> "used_replica": 0,
> "precalc_pgid": 0,
> "last_sent": "1121127.434264s",
> "age": 0.0160001041,
> "attempts": 1,
> "snapid": "head",
> "snap_context": "0=[]",
> "mtime": "2021-12-10T08:35:34.582215+",
> "osd_ops": [
> "write 0~4194304 [fadvise_dontneed] in=4194304b"
> ]
> },
> {
> "tid": 9532806,
> "pg": "3.abba2e66",
> "osd": 2,
> "object_id": "200.02c7a085",
> "object_locator": "@3",
> "target_object_id": "200.02c7a085",
> "target_object_locator": "@3",
> "paused": 0,
> "used_replica": 0,
> "precalc_pgid": 0,
> "last_sent": "1121127.438264s",
> "age": 0.012781,
> "attempts": 1,
> "snapid": "head",
> "snap_context": "0=[]",
> "mtime": "2021-12-10T08:35:34.589044+",
> "osd_ops": [
> "write 0~1236893 [fadvise_dontneed] in=1236893b"
> ]
> },
> {
> "tid": 9532807,
> "pg": "3.abba2e66",
> "osd": 2,
> "object_id": "200.02c7a085",
> "object_locator": "@3",
> "target_object_id": "200.02c7a085",
> "target_object_locator": "@3",
> "paused": 0,
> "used_replica": 0,
> "precalc_pgid": 0,
> "last_sent": "1121127.442264s",
> "age": 0.0085206,
> "attempts": 1,
> "snapid": "head",
> "snap_context": "0=[]",
> "mtime": "2021-12-10T08:35:34.592283+",
> "osd_ops": [
> "write 1236893~510649 [fadvise_dontneed] in=510649b"
> ]
> },
> {
> "tid": 9532808,
> "pg": "3.abba2e66",
> "osd": 2,
> "object_id": "200.02c7a085",
> "object_locator": "@3",
> "target_object_id": "200.02c7a085",
> "target_object_locator": "@3",
>   

[ceph-users] Re: RBD mirroring bootstrap peers - direction

2021-12-15 Thread Torkil Svensgaard

On 12/15/21 14:18, Arthur Outhenin-Chalandre wrote:

On 12/15/21 13:50, Torkil Svensgaard wrote:

Ah, so as long as I don't run the mirror daemons on site-a there is no
risk of overwriting production data there?


To be perfectly clear there should be no risk whatsoever (as Ilya also
said). I suggested to not run rbd-mirror on site-a so that replication
from site-b to site-a wouldn't be a thing at all.

That being said we also run a setup where we only need one way
replication but for the same reasons posted by Ilya we use rx-tx and run
rbd-mirror in both sites.


Hi Arthur

Thanks for the clarification.

I set up one peer with rx-tx, and it seems to replicate as it should, 
but the site-a status looks a little odd. Why down+unknown and status 
not found? Because of rx-tx peer with only one way active?


site-a:
rbd mirror image status rbd_internal/store 
   athos: Thu Dec 16 07:51:26 2021


store:
  global_id:   4888eab6-f6f4-4a11-9e91-e3446651f911
  state:   down+unknown
  description: status not found
  last_update:
  peer_sites:
name: dcn-ceph
state: up+replaying
description: replaying, 
{"bytes_per_second":272384136.53,"bytes_per_snapshot":0.0,"remote_snapshot_timestamp":16395851

29,"replay_state":"syncing","seconds_until_synced":0,"syncing_percent":12,"syncing_snapshot_timestamp":1639585129}
last_update: 2021-12-16 07:51:04
  snapshots:
20 
.mirror.primary.4888eab6-f6f4-4a11-9e91-e3446651f911.137a0eeb-c54d-4f0f-9cb1-3cdc87a891d4 
(peer_uuids:[bf1978c0-423

1-43ca-831f-669bf4a898b2])

site-b:
rbd mirror image status rbd_internal/store 
  dcn-ceph-01: Thu Dec 16 06:53:46 2021


store:
  global_id:   4888eab6-f6f4-4a11-9e91-e3446651f911
  state:   up+replaying
  description: replaying, 
{"bytes_per_second":252701491.2,"bytes_per_snapshot":0.0,"remote_snapshot_timestamp":1639585129,"replay_s

tate":"syncing","seconds_until_synced":0,"syncing_percent":13,"syncing_snapshot_timestamp":1639585129}
  service: dcn-ceph-01.itashe on dcn-ceph-01
  last_update: 2021-12-16 06:53:34

Here's one with rx only. That looks more peaceful:

site-a:
rbd mirror image status rbd/mail 
   athos: Thu Dec 16 07:58:10 2021


mail:
  global_id:   2b3d355c-d095-45a4-8c29-80f059d78483
  snapshots:
116 
.mirror.primary.2b3d355c-d095-45a4-8c29-80f059d78483.82d2240a-319f-4990-84e8-98284864032c 
(peer_uuids:[cabf78ce-f6

5f-4a27-a648-20b3fd326647])


site-b:
rbd mirror image status rbd/mail 
  dcn-ceph-01: Thu Dec 16 06:57:22 2021


mail:
  global_id:   2b3d355c-d095-45a4-8c29-80f059d78483
  state:   up+replaying
  description: replaying, 
{"bytes_per_second":0.0,"bytes_per_snapshot":1140323336192.0,"local_snapshot_timestamp":1639573987,"remot

e_snapshot_timestamp":1639573987,"replay_state":"idle"}
  service: dcn-ceph-01.itashe on dcn-ceph-01
  last_update: 2021-12-16 06:57:07

Mvh.

Torkil

--
Torkil Svensgaard
Sysadmin
MR-Forskningssektionen, afs. 714
DRCMR, Danish Research Centre for Magnetic Resonance
Hvidovre Hospital
Kettegård Allé 30
DK-2650 Hvidovre
Denmark
Tel: +45 386 22828
E-mail: tor...@drcmr.dk
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io