[ceph-users] Re: cephfs: massive drop in MDS requests per second with increasing number of caps

2021-01-20 Thread Dietmar Rieder

Hi Frank,

yes I ran several tests going down from the default 1M, 256k, 128k, 96k 
to 64k which seemed to be opimal in our case.


~Best
   Dietmar

On 1/20/21 1:01 PM, Frank Schilder wrote:

Hi Dietmar,

thanks for that. I reduced the value and, indeed, the number of caps clients 
were holding started going down.

A question about the particular value of 64K. Did you run several tests and 
found this one to be optimal, or was it just a lucky guess?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dietmar Rieder 
Sent: 19 January 2021 13:24:15
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: cephfs: massive drop in MDS requests per second 
with increasing number of caps

Hi Frank,

you don't need to remount the fs. The kernel driver should react to the
change on the MDS.

Best
Dietmar

On 1/19/21 9:07 AM, Frank Schilder wrote:

Hi Dietmar,

thanks for discovering this. I also observed in the past that clients can 
become unbearably slow for no apparent reason. I never managed to reproduce 
this and, therefore, didn't report it here.

A question about setting these flags on an existing mount. Will a "mount -o remount 
/mnt/cephfs" update client settings from the cluster without interrupting I/O? I 
couldn't find anything regarding updating config settings in the manual pages.

I would be most interested in further updates in this matter and also if you 
find other flags with positive performance impact.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dietmar Rieder 
Sent: 18 January 2021 21:01:50
To: ceph-users@ceph.io
Subject: [ceph-users] Re: cephfs: massive drop in MDS requests per second with 
increasing number of caps

Hi Burkhard, hi list,


I checked the 'mds_max_caps_per_client' setting and it turned out that
it was set to the default value of 1 million. The
'mds_cache_memory_limit' setting, however I had previously set to 40GB.


Given this, I now started to play around with the max_caps and set
'mds_max_caps_per_client' 64k:

# ceph config set mds mds_max_caps_per_client 65536

And this resulted in to a much better an stable performance of ~1.4k
req/seq from one client and ~2.9k req/sec when running 2 clients in
parallel. Remember it was max ~660 req/sec before using the 1M default
and it gradually decreased to ~60 req/sec after some minute never
getting higher again unless manually dropping the dentries and inodes
from the vm cache on the client. (I guess this is because only 5000 caps
are recalled after reaching the mds_max_caps_per_client)

I'll keep this for now and observe if it has any impact on other
operations or situations.

Still I wonder why a higher number (i.e. >64k) of caps on the client
destroys the performance completely.

Thanks again
 Dietmar

On 1/18/21 6:20 PM, Dietmar Rieder wrote:

Hi Burkhard,

thanks so much for the quick reply and the explanation and suggestions.
I'll check these settings and eventually change them and report back.

Best
 Dietmar

On 1/18/21 6:00 PM, Burkhard Linke wrote:

Hi,

On 1/18/21 5:46 PM, Dietmar Rieder wrote:

Hi all,

we noticed a massive drop in requests per second a cephfs client is
able to perform when we do a recursive chown over a directory with
millions of files. As soon as we see about 170k caps on the MDS, the
client performance drops from about 660 reqs/sec to 70 reqs/sec.

When we then clear dentries and inodes using "sync; echo 2 >
/proc/sys/vm/drop_caches" on the client, the request go up to ~660
again just to drop again when reaching about 170k caps.

See the attached screenshots.

When we stop the chown process for a while and restart it ~25min
later again it still performs very slowly and the MDS reqs/sec remain
low (~60/sec.). Clearing the cache (dentries and inodes) on the
client restores the performance again.

When we run the same chown on another client in parallel, it starts
again with reasonable good performance (while the first client is
poorly performing) but eventually it gets slow again just like the
first client.

Can someone comment on this and explain it?
How can this be solved, so that the performance remains stable?


The MDS has a (soft) limit for number of caps per client. If a clients
starts to requests more caps, the MDS will ask it to release caps.
This will add an extra network round trip, thus increasing processing
time. The setting is 'mds_max_caps_per_client'. The default value is 1
million caps per client, but maybe this setting was changed in our
configuration or the overall cap limit for the MDS is restricting it.


Since each assigned cap increases the memory consumption of the MDS,
setting an upper limit helps to control the overall amount of memory
the MDS is using. So the memory target also affects the number of
active caps an MDS can manage. You need to adjust both values to your
use case

[ceph-users] RBD on windows

2021-01-20 Thread Szabo, Istvan (Agoda)
Hi,

I'm looking the suse documentation regarding their option to have rbd on win.
I want to try on windows server 2019 vm, but I got this error:

PS C:\Users\$admin$> rbd create image01 --size 4096 --pool windowstest -m 
10.118.199.248,10.118.199.249,10.118.199.250 --id windowstest --keyring 
C:/ProgramData/ceph/keyring
2021-01-20T11:15:29.066SE Asia Standard Time 1 -1 auth: error parsing file 
C:/ProgramData/ceph/keyring: cannot parse buffer: Malformed input
2021-01-20T11:15:29.066SE Asia Standard Time 1 -1 auth: failed to load 
C:/ProgramData/ceph/keyring: (5) Input/output error
2021-01-20T11:15:29.066SE Asia Standard Time 1 -1 auth: error parsing file 
C:/ProgramData/ceph/keyring: cannot parse buffer: Malformed input
2021-01-20T11:15:29.066SE Asia Standard Time 1 -1 auth: failed to load 
C:/ProgramData/ceph/keyring: (5) Input/output error
2021-01-20T11:15:29.066SE Asia Standard Time 1 -1 auth: error parsing file 
C:/ProgramData/ceph/keyring: cannot parse buffer: Malformed input
rbd: couldn't connect to the cluster!
2021-01-20T11:15:29.066SE Asia Standard Time 1 -1 auth: failed to load 
C:/ProgramData/ceph/keyring: (5) Input/output error
2021-01-20T11:15:29.066SE Asia Standard Time 1 -1 monclient: keyring not found

This is the keyring file:

[client.windowstest]
key = AQBJ7wdgdWLIMhAAle+/pg+26XvWsDv8PyPcvw==
caps mon = "allow rw"
caps osd = "allow rwx pool=windowstest"

And this is the ceph.conf file on the windows client:
[global]
 log to stderr = true
 run dir = C:/ProgramData/ceph
 crash dir = C:/ProgramData/ceph
[client]
 keyring = C:/ProgramData/ceph/keyring
 log file = C:/ProgramData/ceph/$name.$pid.log
 admin socket = C:/ProgramData/ceph/$name.$pid.asok
[global]
 mon host = [v2:10.118.199.231:3300,v1:10.118.199.231:6789] 
[v2:10.118.199.232:3300,v1:10.118.199.232:6789] 
[v2:10.118.199.233:3300,v1:10.118.199.233:6789]

Commands I've tried:
rbd create image01 --size 4096 --pool windowstest -m 
10.118.199.248,10.118.199.249,10.118.199.250 --id windowstest --keyring 
C:/ProgramData/ceph/keyring
rbd create image01 --size 4096 --pool windowstest -m 
10.118.199.248,10.118.199.249,10.118.199.250 --id windowstest --keyring 
C:\ProgramData\ceph\keyring
rbd create image01 --size 4096 --pool windowstest -m 
10.118.199.248,10.118.199.249,10.118.199.250 --id windowstest --keyring 
"C:/ProgramData/ceph/keyring"
rbd create image01 --size 4096 --pool windowstest -m 
10.118.199.248,10.118.199.249,10.118.199.250 --id windowstest --keyring 
"C:\ProgramData\ceph\keyring"
rbd create blank_image --size=1G

The ceph version is luminous 12.2.8.

Don't know why they don't find mon keyring.

Thank you.


This message is confidential and is for the sole use of the intended 
recipient(s). It may also be privileged or otherwise protected by copyright or 
other legal rules. If you have received it by mistake please let us know by 
reply email and delete it from your system. It is prohibited to copy this 
message or disclose its content to anyone. Any confidentiality or privilege is 
not waived or lost by any mistaken delivery or unauthorized disclosure of the 
message. All messages sent to and from Agoda may be monitored to ensure 
compliance with company policies, to protect the company's interests and to 
remove potential malware. Electronic messages may be intercepted, amended, lost 
or deleted, or contain viruses.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: massive drop in MDS requests per second with increasing number of caps

2021-01-20 Thread Frank Schilder
Hi Dietmar,

thanks for that. I reduced the value and, indeed, the number of caps clients 
were holding started going down.

A question about the particular value of 64K. Did you run several tests and 
found this one to be optimal, or was it just a lucky guess?

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dietmar Rieder 
Sent: 19 January 2021 13:24:15
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: cephfs: massive drop in MDS requests per second 
with increasing number of caps

Hi Frank,

you don't need to remount the fs. The kernel driver should react to the
change on the MDS.

Best
   Dietmar

On 1/19/21 9:07 AM, Frank Schilder wrote:
> Hi Dietmar,
>
> thanks for discovering this. I also observed in the past that clients can 
> become unbearably slow for no apparent reason. I never managed to reproduce 
> this and, therefore, didn't report it here.
>
> A question about setting these flags on an existing mount. Will a "mount -o 
> remount /mnt/cephfs" update client settings from the cluster without 
> interrupting I/O? I couldn't find anything regarding updating config settings 
> in the manual pages.
>
> I would be most interested in further updates in this matter and also if you 
> find other flags with positive performance impact.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Dietmar Rieder 
> Sent: 18 January 2021 21:01:50
> To: ceph-users@ceph.io
> Subject: [ceph-users] Re: cephfs: massive drop in MDS requests per second 
> with increasing number of caps
>
> Hi Burkhard, hi list,
>
>
> I checked the 'mds_max_caps_per_client' setting and it turned out that
> it was set to the default value of 1 million. The
> 'mds_cache_memory_limit' setting, however I had previously set to 40GB.
>
>
> Given this, I now started to play around with the max_caps and set
> 'mds_max_caps_per_client' 64k:
>
> # ceph config set mds mds_max_caps_per_client 65536
>
> And this resulted in to a much better an stable performance of ~1.4k
> req/seq from one client and ~2.9k req/sec when running 2 clients in
> parallel. Remember it was max ~660 req/sec before using the 1M default
> and it gradually decreased to ~60 req/sec after some minute never
> getting higher again unless manually dropping the dentries and inodes
> from the vm cache on the client. (I guess this is because only 5000 caps
> are recalled after reaching the mds_max_caps_per_client)
>
> I'll keep this for now and observe if it has any impact on other
> operations or situations.
>
> Still I wonder why a higher number (i.e. >64k) of caps on the client
> destroys the performance completely.
>
> Thanks again
> Dietmar
>
> On 1/18/21 6:20 PM, Dietmar Rieder wrote:
>> Hi Burkhard,
>>
>> thanks so much for the quick reply and the explanation and suggestions.
>> I'll check these settings and eventually change them and report back.
>>
>> Best
>> Dietmar
>>
>> On 1/18/21 6:00 PM, Burkhard Linke wrote:
>>> Hi,
>>>
>>> On 1/18/21 5:46 PM, Dietmar Rieder wrote:
 Hi all,

 we noticed a massive drop in requests per second a cephfs client is
 able to perform when we do a recursive chown over a directory with
 millions of files. As soon as we see about 170k caps on the MDS, the
 client performance drops from about 660 reqs/sec to 70 reqs/sec.

 When we then clear dentries and inodes using "sync; echo 2 >
 /proc/sys/vm/drop_caches" on the client, the request go up to ~660
 again just to drop again when reaching about 170k caps.

 See the attached screenshots.

 When we stop the chown process for a while and restart it ~25min
 later again it still performs very slowly and the MDS reqs/sec remain
 low (~60/sec.). Clearing the cache (dentries and inodes) on the
 client restores the performance again.

 When we run the same chown on another client in parallel, it starts
 again with reasonable good performance (while the first client is
 poorly performing) but eventually it gets slow again just like the
 first client.

 Can someone comment on this and explain it?
 How can this be solved, so that the performance remains stable?
>>>
>>> The MDS has a (soft) limit for number of caps per client. If a clients
>>> starts to requests more caps, the MDS will ask it to release caps.
>>> This will add an extra network round trip, thus increasing processing
>>> time. The setting is 'mds_max_caps_per_client'. The default value is 1
>>> million caps per client, but maybe this setting was changed in our
>>> configuration or the overall cap limit for the MDS is restricting it.
>>>
>>>
>>> Since each assigned cap increases the memory consumption of the MDS,
>>> setting an upper limit helps to control the overall amount of memory
>>> the MDS is using. So the memory target also af

[ceph-users] cephadm db_slots and wal_slots ignored

2021-01-20 Thread Schweiss, Chip
I'm trying to set up a new ceph cluster with cephadm on a SUSE SES trial
that has Ceph 15.2.8

Each OSD node has 18 rotational SAS disks, 4 NVMe 2TB SSDs for DB, and 2
NVME2 200GB Optane SSDs for WAL.

These servers will eventually have 24 rotational SAS disks that they will
inherit from existing storage servers.  So I don't want all the space used
on the DB and WAL SSDs.

I suspect from the comment "(db_slots is actually to be favoured here, but
it's not implemented yet)" on this page in the docs:
https://docs.ceph.com/en/latest/cephadm/drivegroups/#the-advanced-case these
parameters are not yet implemented, yet are documented as such under
"ADDITIONAL OPTIONS"

My osd_spec.yml:
service_type: osd
service_id: three_tier_osd
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
  model: 'ST14000NM0288'
db_devices:
  rotational: 0
  model: 'INTEL SSDPE2KX020T8'
  limit: 6
wal_devices:
  model: 'INTEL SSDPEL1K200GA'
  limit: 12
db_slots: 6
wal_slots: 12

All available space is consumed on my DB and WAL SSDs with only 18 OSDs,
leaving no room to add additional spindles.

Is this still work in progress, or a bug I should report?  Possibly related
to https://github.com/rook/rook/issues/5026  At the minimum, this appears
to be a documentation bug.

How can I work around this?

-Chip
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
I have been doing some testing with RBD-Mirror Snapshots to a remote Ceph 
cluster. 

Does anyone know if the images on the remote cluster can be utilized in anyway? 
Would love the ability to clone them, or even readonly would be nice. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread David Caro

Have you tried just using them?
(RO, if you do RW things might go crazy, would be nice to try though).

You might be able to create a clone too, and I guess worst case just cp/deep
cp.

I'm interested in your findings btw. I'd be greatful if you share them :)


Thanks!

On 01/20 14:23, Adam Boyhan wrote:
> I have been doing some testing with RBD-Mirror Snapshots to a remote Ceph 
> cluster. 
> 
> Does anyone know if the images on the remote cluster can be utilized in 
> anyway? Would love the ability to clone them, or even readonly would be nice. 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation 
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."


signature.asc
Description: PGP signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 NVMe's are 
the OSD drives. Heavily based on the Micron/Supermicro white papers. 

When I attempt to protect the snapshot on a remote image, it errors with read 
only. 

root@Bunkcephmon2:~# rbd snap protect CephTestPool1/vm-100-disk-0@TestSnapper1 
rbd: protecting snap failed: (30) Read-only file system 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Eugen Block
But you should be able to clone the mirrored snapshot on the remote  
cluster even though it’s not protected, IIRC.



Zitat von Adam Boyhan :

Two separate 4 node clusters with 10 OSD's in each node. Micron 9300  
NVMe's are the OSD drives. Heavily based on the Micron/Supermicro  
white papers.


When I attempt to protect the snapshot on a remote image, it errors  
with read only.


root@Bunkcephmon2:~# rbd snap protect  
CephTestPool1/vm-100-disk-0@TestSnapper1

rbd: protecting snap failed: (30) Read-only file system
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
That's what I though as well, specially based on this. 



Note 

You may clone a snapshot from one pool to an image in another pool. For 
example, you may maintain read-only images and snapshots as templates in one 
pool, and writeable clones in another pool. 

root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool2/vm-100-disk-0-CLONE 
2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
0x55c7cf8417f0 validate_parent: parent snapshot must be protected 

root@Bunkcephmon2:~# rbd snap protect CephTestPool1/vm-100-disk-0@TestSnapper1 
rbd: protecting snap failed: (30) Read-only file system 


From: "Eugen Block"  
To: "adamb"  
Cc: "ceph-users" , "Matt Wilder"  
Sent: Wednesday, January 20, 2021 3:00:54 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

But you should be able to clone the mirrored snapshot on the remote 
cluster even though it’s not protected, IIRC. 


Zitat von Adam Boyhan : 

> Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 
> NVMe's are the OSD drives. Heavily based on the Micron/Supermicro 
> white papers. 
> 
> When I attempt to protect the snapshot on a remote image, it errors 
> with read only. 
> 
> root@Bunkcephmon2:~# rbd snap protect 
> CephTestPool1/vm-100-disk-0@TestSnapper1 
> rbd: protecting snap failed: (30) Read-only file system 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Jason Dillaman
On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote:
>
> That's what I though as well, specially based on this.
>
>
>
> Note
>
> You may clone a snapshot from one pool to an image in another pool. For 
> example, you may maintain read-only images and snapshots as templates in one 
> pool, and writeable clones in another pool.
>
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE
> 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> 0x55c7cf8417f0 validate_parent: parent snapshot must be protected
>
> root@Bunkcephmon2:~# rbd snap protect CephTestPool1/vm-100-disk-0@TestSnapper1
> rbd: protecting snap failed: (30) Read-only file system

You have two options: (1) protect the snapshot on the primary image so
that the protection status replicates or (2) utilize RBD clone v2
which doesn't require protection but does require Mimic or later
clients [1].

>
> From: "Eugen Block" 
> To: "adamb" 
> Cc: "ceph-users" , "Matt Wilder" 
> Sent: Wednesday, January 20, 2021 3:00:54 PM
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses
>
> But you should be able to clone the mirrored snapshot on the remote
> cluster even though it’s not protected, IIRC.
>
>
> Zitat von Adam Boyhan :
>
> > Two separate 4 node clusters with 10 OSD's in each node. Micron 9300
> > NVMe's are the OSD drives. Heavily based on the Micron/Supermicro
> > white papers.
> >
> > When I attempt to protect the snapshot on a remote image, it errors
> > with read only.
> >
> > root@Bunkcephmon2:~# rbd snap protect
> > CephTestPool1/vm-100-disk-0@TestSnapper1
> > rbd: protecting snap failed: (30) Read-only file system
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

[1] https://ceph.io/community/new-mimic-simplified-rbd-image-cloning/

-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Adam Boyhan
Awesome information. I new I had to be missing something. 

All of my clients will be far newer than mimic so I don't think that will be an 
issue. 

Added the following to my ceph.conf on both clusters. 

rbd_default_clone_format = 2 

root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool2/vm-100-disk-0-CLONE 
root@Bunkcephmon2:~# rbd ls CephTestPool2 
vm-100-disk-0-CLONE 

I am sure I will be back with more questions. Hoping to replace our Nimble 
storage with Ceph and NVMe. 

Appreciate it! 


From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Wednesday, January 20, 2021 3:28:39 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote: 
> 
> That's what I though as well, specially based on this. 
> 
> 
> 
> Note 
> 
> You may clone a snapshot from one pool to an image in another pool. For 
> example, you may maintain read-only images and snapshots as templates in one 
> pool, and writeable clones in another pool. 
> 
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE 
> 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> 0x55c7cf8417f0 validate_parent: parent snapshot must be protected 
> 
> root@Bunkcephmon2:~# rbd snap protect 
> CephTestPool1/vm-100-disk-0@TestSnapper1 
> rbd: protecting snap failed: (30) Read-only file system 

You have two options: (1) protect the snapshot on the primary image so 
that the protection status replicates or (2) utilize RBD clone v2 
which doesn't require protection but does require Mimic or later 
clients [1]. 

> 
> From: "Eugen Block"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Wednesday, January 20, 2021 3:00:54 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> But you should be able to clone the mirrored snapshot on the remote 
> cluster even though it’s not protected, IIRC. 
> 
> 
> Zitat von Adam Boyhan : 
> 
> > Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 
> > NVMe's are the OSD drives. Heavily based on the Micron/Supermicro 
> > white papers. 
> > 
> > When I attempt to protect the snapshot on a remote image, it errors 
> > with read only. 
> > 
> > root@Bunkcephmon2:~# rbd snap protect 
> > CephTestPool1/vm-100-disk-0@TestSnapper1 
> > rbd: protecting snap failed: (30) Read-only file system 
> > ___ 
> > ceph-users mailing list -- ceph-users@ceph.io 
> > To unsubscribe send an email to ceph-users-le...@ceph.io 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 

[1] https://ceph.io/community/new-mimic-simplified-rbd-image-cloning/ 

-- 
Jason 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-20 Thread Matt Wilder
Can you describe your Ceph deployment?

On Wed, Jan 20, 2021 at 11:24 AM Adam Boyhan  wrote:

> I have been doing some testing with RBD-Mirror Snapshots to a remote Ceph
> cluster.
>
> Does anyone know if the images on the remote cluster can be utilized in
> anyway? Would love the ability to clone them, or even readonly would be
> nice.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

-- 


This e-mail and all information in, attached to, or linked via this 
e-mail (together the ‘e-mail’) is confidential and may be legally 
privileged. It is intended solely for the intended addressee(s). Access to, 
or any onward transmission, of this e-mail by any other person is not 
authorised. If you are not the intended recipient, you are requested to 
immediately alert the sender of this e-mail and to immediately delete this 
e-mail. Any disclosure in any form of all or part of this e-mail, or of any 
the parties to it, including any copying, distribution or any action taken 
or omitted to be taken in reliance on it, is prohibited and may be 
unlawful. 




This e-mail is not, and is not intended to be, and should 
not be construed as being, (a) any offer, solicitation, or promotion of any 
kind; (b) the basis of any investment or other decision(s);  (c) any 
recommendation to buy, sell or transact in any manner any good(s), 
product(s) or service(s), nor engage in any investment(s) or other 
transaction(s) or activities;  or (d) the provision of, or related to, any 
advisory service(s) or activities, including regarding any investment, tax, 
legal, financial, accounting, consulting or any other related service(s).
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs: massive drop in MDS requests per second with increasing number of caps

2021-01-20 Thread Frank Schilder
Hi Dietmar,

thanks for that info. I repeated a benchmark test we were using when trying to 
find out what the problem is. Its un-taring an anaconda2 archive, which 
produces a high mixed load on a file system. I remember that it used to take 
ca. 4 minutes on a freshly mounted client. After reducing the cache value, it 
takes only 2 minutes.

The reason we used this as a benchmark is, that a user observed that the 
extraction stalls after a while. Back then we couldn't reliably reproduce the 
problem, so we didn't really know what to look for. It was really strange. When 
I did it as myself, it was fast. When I did it as the other user on the same 
server, it stalled. It usually happened around the same place when hard-links 
were created. The ceph documentation mentioned that hard links are a challenge 
and we didn't have time to dig more into it.

I don't have any observations about possible slow-downs as seen before, but its 
only a day in production for now. I hope the reduction doesn't have any 
negative side effects. So far, I didn't receive any complaints.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Dietmar Rieder 
Sent: 20 January 2021 14:18:19
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: cephfs: massive drop in MDS requests per second 
with increasing number of caps

Hi Frank,

yes I ran several tests going down from the default 1M, 256k, 128k, 96k
to 64k which seemed to be opimal in our case.

~Best
Dietmar

On 1/20/21 1:01 PM, Frank Schilder wrote:
> Hi Dietmar,
>
> thanks for that. I reduced the value and, indeed, the number of caps clients 
> were holding started going down.
>
> A question about the particular value of 64K. Did you run several tests and 
> found this one to be optimal, or was it just a lucky guess?
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Dietmar Rieder 
> Sent: 19 January 2021 13:24:15
> To: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: cephfs: massive drop in MDS requests per second 
> with increasing number of caps
>
> Hi Frank,
>
> you don't need to remount the fs. The kernel driver should react to the
> change on the MDS.
>
> Best
> Dietmar
>
> On 1/19/21 9:07 AM, Frank Schilder wrote:
>> Hi Dietmar,
>>
>> thanks for discovering this. I also observed in the past that clients can 
>> become unbearably slow for no apparent reason. I never managed to reproduce 
>> this and, therefore, didn't report it here.
>>
>> A question about setting these flags on an existing mount. Will a "mount -o 
>> remount /mnt/cephfs" update client settings from the cluster without 
>> interrupting I/O? I couldn't find anything regarding updating config 
>> settings in the manual pages.
>>
>> I would be most interested in further updates in this matter and also if you 
>> find other flags with positive performance impact.
>>
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Dietmar Rieder 
>> Sent: 18 January 2021 21:01:50
>> To: ceph-users@ceph.io
>> Subject: [ceph-users] Re: cephfs: massive drop in MDS requests per second 
>> with increasing number of caps
>>
>> Hi Burkhard, hi list,
>>
>>
>> I checked the 'mds_max_caps_per_client' setting and it turned out that
>> it was set to the default value of 1 million. The
>> 'mds_cache_memory_limit' setting, however I had previously set to 40GB.
>>
>>
>> Given this, I now started to play around with the max_caps and set
>> 'mds_max_caps_per_client' 64k:
>>
>> # ceph config set mds mds_max_caps_per_client 65536
>>
>> And this resulted in to a much better an stable performance of ~1.4k
>> req/seq from one client and ~2.9k req/sec when running 2 clients in
>> parallel. Remember it was max ~660 req/sec before using the 1M default
>> and it gradually decreased to ~60 req/sec after some minute never
>> getting higher again unless manually dropping the dentries and inodes
>> from the vm cache on the client. (I guess this is because only 5000 caps
>> are recalled after reaching the mds_max_caps_per_client)
>>
>> I'll keep this for now and observe if it has any impact on other
>> operations or situations.
>>
>> Still I wonder why a higher number (i.e. >64k) of caps on the client
>> destroys the performance completely.
>>
>> Thanks again
>>  Dietmar
>>
>> On 1/18/21 6:20 PM, Dietmar Rieder wrote:
>>> Hi Burkhard,
>>>
>>> thanks so much for the quick reply and the explanation and suggestions.
>>> I'll check these settings and eventually change them and report back.
>>>
>>> Best
>>>  Dietmar
>>>
>>> On 1/18/21 6:00 PM, Burkhard Linke wrote:
 Hi,

 On 1/18/21 5:46 PM, Dietmar Rieder wrote:
> Hi all,
>
> we noticed a massive drop in requests per second a cephfs client is

[ceph-users] Large rbd

2021-01-20 Thread Chris Dunlop

Hi,

What limits are there on the "reasonable size" of an rbd?

E.g. when I try to create a 1 PB rbd with default 4 MiB objects on my 
octopus cluster:


$ rbd create --size 1P --data-pool rbd.ec rbd.meta/fs
2021-01-20T18:19:35.799+1100 7f47a99253c0 -1 librbd::image::CreateRequest: 
validate_layout: image size not compatible with object

...which somes from:

== src/librbd/image/CreateRequest.cc
bool validate_layout(CephContext *cct, uint64_t size, file_layout_t &layout) {
  if (!librbd::ObjectMap<>::is_compatible(layout, size)) {
lderr(cct) << "image size not compatible with object map" << dendl;
return false;
  }

== src/librbd/ObjectMap.cc
template 
  bool ObjectMap::is_compatible(const file_layout_t& layout, uint64_t size) {
uint64_t object_count = Striper::get_num_objects(layout, size);
return (object_count <= cls::rbd::MAX_OBJECT_MAP_OBJECT_COUNT);
  }

== src/cls/rbd/cls_rbd_types.h
static const uint32_t MAX_OBJECT_MAP_OBJECT_COUNT = 25600;

For 4 MiB objects that object count equates to just over 976 TB.

Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or it 
just "this is crazy large, if you're trying to go over this you're doing 
something wrong, rethink your life..."?


Yes, I realise I can increase the size of the objects to get a larger rbd, 
or drop the object-map support (and the fast-diff that goes along with 
it).


I'm SO glad I found this limit now rather than starting on a smaller rbd 
and a finding the limit when I tried to grow the rbd underneath a rapidly 
filling filesystem.


What else should I know?

Background: I currently have nearly 0.5 PB on XFS (on lvm / raid6) and ZFS 
that I'm looking to move over to ceph. XFS is a requirement, for the 
reflinking (sadly not yet available in CephFS: https://tracker.ceph.com/issues/1680). 
The recommendation for XFS is to start larger, on a thin-provisioned store 
(hello rbd!), rather than start smaller and grow as needed - e.g. see the 
thread surrounding:


https://www.spinics.net/lists/linux-xfs/msg20099.html

Rather than a single large rbd, should I be looking at multiple smaller 
rbds linked together using lvm or somesuch? What are the tradeoffs?


And whilst we're here... for an rbd with the data on an erasure-coded 
pool, how do you calculate the amount of rbd metadata required if/when the 
rbd data is fully allocated?



Cheers,

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm db_slots and wal_slots ignored

2021-01-20 Thread Eugen Block

If you use block_db_size and limit in your yaml file, e.g.

block_db_size: 64G  (or whatever you choose)
limit: 6

this should not consume the entire disk but only as much as you  
configured. Can you try if that works for you?



Zitat von "Schweiss, Chip" :


I'm trying to set up a new ceph cluster with cephadm on a SUSE SES trial
that has Ceph 15.2.8

Each OSD node has 18 rotational SAS disks, 4 NVMe 2TB SSDs for DB, and 2
NVME2 200GB Optane SSDs for WAL.

These servers will eventually have 24 rotational SAS disks that they will
inherit from existing storage servers.  So I don't want all the space used
on the DB and WAL SSDs.

I suspect from the comment "(db_slots is actually to be favoured here, but
it's not implemented yet)" on this page in the docs:
https://docs.ceph.com/en/latest/cephadm/drivegroups/#the-advanced-case these
parameters are not yet implemented, yet are documented as such under
"ADDITIONAL OPTIONS"

My osd_spec.yml:
service_type: osd
service_id: three_tier_osd
placement:
  host_pattern: '*'
data_devices:
  rotational: 1
  model: 'ST14000NM0288'
db_devices:
  rotational: 0
  model: 'INTEL SSDPE2KX020T8'
  limit: 6
wal_devices:
  model: 'INTEL SSDPEL1K200GA'
  limit: 12
db_slots: 6
wal_slots: 12

All available space is consumed on my DB and WAL SSDs with only 18 OSDs,
leaving no room to add additional spindles.

Is this still work in progress, or a bug I should report?  Possibly related
to https://github.com/rook/rook/issues/5026  At the minimum, this appears
to be a documentation bug.

How can I work around this?

-Chip
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io