[ceph-users] Re: Large rbd

2021-01-21 Thread Robert Sander
Hi,

Am 21.01.21 um 05:42 schrieb Chris Dunlop:

> Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
> it just "this is crazy large, if you're trying to go over this you're
> doing something wrong, rethink your life..."?

IMHO the limit is there because of the way deletion of RBDs work. "rbd
rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.

> Rather than a single large rbd, should I be looking at multiple smaller
> rbds linked together using lvm or somesuch? What are the tradeoffs?

IMHO there are no tradeoffs, there could even be benefits creating a
volume group with multiple physical volumes on RBD as the requests can
be bettere parallelized (i.e. virtio-single SCSI controller for qemu).

Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to make HEALTH_ERR quickly and pain-free

2021-01-21 Thread George Shuklin
I have hell of the question: how to make HEALTH_ERR status for a cluster 
without consequences?


I'm working on CI tests and I need to check if our reaction to 
HEALTH_ERR is good. For this I need to take an empty cluster with an 
empty pool and do something. Preferably quick and reversible.


For HEALTH_WARN the best thing I found is to change pool size to 1, it 
raises "1 pool(s) have no replicas configured" warning almost instantly 
and it can be reverted very quickly for empty pool.


But HEALTH_ERR is a bit more tricky. Any ideas?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-01-21 Thread Eugen Block

Hi,

For HEALTH_WARN the best thing I found is to change pool size to 1,  
it raises "1 pool(s) have no replicas configured" warning almost  
instantly and it can be reverted very quickly for empty pool.


any osd flag (noout, nodeep-scrub etc.) cause health warnings. ;-)


But HEALTH_ERR is a bit more tricky. Any ideas?


I think if you set a very low quota for a pool (e.g. 1000 bytes or so)  
and fill it up it should create a HEALTH_ERR status, IIRC.



Zitat von George Shuklin :

I have hell of the question: how to make HEALTH_ERR status for a  
cluster without consequences?


I'm working on CI tests and I need to check if our reaction to  
HEALTH_ERR is good. For this I need to take an empty cluster with an  
empty pool and do something. Preferably quick and reversible.


For HEALTH_WARN the best thing I found is to change pool size to 1,  
it raises "1 pool(s) have no replicas configured" warning almost  
instantly and it can be reverted very quickly for empty pool.


But HEALTH_ERR is a bit more tricky. Any ideas?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-01-21 Thread George Shuklin

On 21/01/2021 13:02, Eugen Block wrote:

But HEALTH_ERR is a bit more tricky. Any ideas?


I think if you set a very low quota for a pool (e.g. 1000 bytes or so) 
and fill it up it should create a HEALTH_ERR status, IIRC. 
Cool idea. Unfortunately, even with 1 byte quota (and some data in the 
pool), it's HEALTH_WARN, 1 pool(s) full

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-01-21 Thread George Shuklin

On 21/01/2021 12:57, George Shuklin wrote:
I have hell of the question: how to make HEALTH_ERR status for a 
cluster without consequences?


I'm working on CI tests and I need to check if our reaction to 
HEALTH_ERR is good. For this I need to take an empty cluster with an 
empty pool and do something. Preferably quick and reversible.


For HEALTH_WARN the best thing I found is to change pool size to 1, it 
raises "1 pool(s) have no replicas configured" warning almost 
instantly and it can be reverted very quickly for empty pool.


But HEALTH_ERR is a bit more tricky. Any ideas?


I found the way:

ceph osd set-full-ratio 0.0

instantly causing

    health: HEALTH_ERR
    full ratio(s) out of order

even on empty cluster. Problem solved.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-01-21 Thread Eugen Block

Oh really, I thought it would be an error. My bad.

There was an osd flag "full" which is not usable anymore, I never used  
it so I just tried it with full OSD which should lead to an error (and  
it does):


host:~ # ceph -s
  cluster:
id: 8f279f36-811c-3270-9f9d-58335b1bb9c0
health: HEALTH_ERR
1 full osd(s)
22 pool(s) full

A created a pool with 1 pg and size 1, created an rbd image in that  
pool and filled that up until the respective OSD was full. I have a  
virtual lab cluster with 20GB OSDs so it didn't take that long. If you  
try this, make sure to disable pg-autoscaler on that pool or otherwise  
it will increase pg-num. Does that help?


Regards,
Eugen


Zitat von George Shuklin :


On 21/01/2021 13:02, Eugen Block wrote:

But HEALTH_ERR is a bit more tricky. Any ideas?


I think if you set a very low quota for a pool (e.g. 1000 bytes or  
so) and fill it up it should create a HEALTH_ERR status, IIRC.
Cool idea. Unfortunately, even with 1 byte quota (and some data in  
the pool), it's HEALTH_WARN, 1 pool(s) full

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD-Mirror Mirror Snapshot stuck

2021-01-21 Thread Adam Boyhan
I have a rbd-mirror snapshot on 1 image that failed to replicate and now its 
not getting cleaned up. 

The cause of this was my fault based on my steps. Just trying to understand how 
to clean up/handle the situation. 

Here is how I got into this situation. 

- Created manual rbd snapshot on the image 
- On the remote cluster I cloned the snapshot 
- While cloned on the secondary cluster I made the mistake of deleting the 
snapshot on the primary 
- The subsequent mirror snapshot failed 
- I then removed the clone 
- The next mirror snapshot was successful but I was left with this mirror 
snapshot on the primary that I can't seem to get rid of 

root@Ccscephtest1:/var/log/ceph# rbd snap ls --all CephTestPool1/vm-100-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
10082 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[]) 
10243 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.483e55aa-2f64-4bb0-ac0f-7b5aac59830e
 2 TiB Thu Jan 21 07:30:08 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

I have tried deleting the snap with "rbd snap rm" like normal user created 
snaps, but no luck. Anyway to force the deletion? 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to make HEALTH_ERR quickly and pain-free

2021-01-21 Thread Eugen Block
Oh that's better, I had to recreate my OSD because it didn't want to  
start anymore :-D



Zitat von George Shuklin :


On 21/01/2021 12:57, George Shuklin wrote:
I have hell of the question: how to make HEALTH_ERR status for a  
cluster without consequences?


I'm working on CI tests and I need to check if our reaction to  
HEALTH_ERR is good. For this I need to take an empty cluster with  
an empty pool and do something. Preferably quick and reversible.


For HEALTH_WARN the best thing I found is to change pool size to 1,  
it raises "1 pool(s) have no replicas configured" warning almost  
instantly and it can be reverted very quickly for empty pool.


But HEALTH_ERR is a bit more tricky. Any ideas?


I found the way:

ceph osd set-full-ratio 0.0

instantly causing

    health: HEALTH_ERR
    full ratio(s) out of order

even on empty cluster. Problem solved.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
When cloning the snapshot on the remote cluster I can't see my ext4 filesystem. 

Using the same exact snapshot on both sides. Shouldn't this be consistent? 

Primary Site 
root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
TestSnapper1 
10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 

root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool1/vm-100-disk-0-CLONE 
root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Ccscephtest1:~# mount /dev/nbd0 /usr2 

Secondary Site 
root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
TestSnapper1 
10430 TestSnapper1 2 TiB Thu Jan 21 08:20:08 2021 user 

root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool1/vm-100-disk-0-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
codepage or helper program, or other error. 




From: "adamb"  
To: "dillaman"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Wednesday, January 20, 2021 3:42:46 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

Awesome information. I new I had to be missing something. 

All of my clients will be far newer than mimic so I don't think that will be an 
issue. 

Added the following to my ceph.conf on both clusters. 

rbd_default_clone_format = 2 

root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool2/vm-100-disk-0-CLONE 
root@Bunkcephmon2:~# rbd ls CephTestPool2 
vm-100-disk-0-CLONE 

I am sure I will be back with more questions. Hoping to replace our Nimble 
storage with Ceph and NVMe. 

Appreciate it! 


From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Wednesday, January 20, 2021 3:28:39 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote: 
> 
> That's what I though as well, specially based on this. 
> 
> 
> 
> Note 
> 
> You may clone a snapshot from one pool to an image in another pool. For 
> example, you may maintain read-only images and snapshots as templates in one 
> pool, and writeable clones in another pool. 
> 
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE 
> 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> 0x55c7cf8417f0 validate_parent: parent snapshot must be protected 
> 
> root@Bunkcephmon2:~# rbd snap protect 
> CephTestPool1/vm-100-disk-0@TestSnapper1 
> rbd: protecting snap failed: (30) Read-only file system 

You have two options: (1) protect the snapshot on the primary image so 
that the protection status replicates or (2) utilize RBD clone v2 
which doesn't require protection but does require Mimic or later 
clients [1]. 

> 
> From: "Eugen Block"  
> To: "adamb"  
> Cc: "ceph-users" , "Matt Wilder"  
> Sent: Wednesday, January 20, 2021 3:00:54 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> But you should be able to clone the mirrored snapshot on the remote 
> cluster even though it’s not protected, IIRC. 
> 
> 
> Zitat von Adam Boyhan : 
> 
> > Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 
> > NVMe's are the OSD drives. Heavily based on the Micron/Supermicro 
> > white papers. 
> > 
> > When I attempt to protect the snapshot on a remote image, it errors 
> > with read only. 
> > 
> > root@Bunkcephmon2:~# rbd snap protect 
> > CephTestPool1/vm-100-disk-0@TestSnapper1 
> > rbd: protecting snap failed: (30) Read-only file system 
> > ___ 
> > ceph-users mailing list -- ceph-users@ceph.io 
> > To unsubscribe send an email to ceph-users-le...@ceph.io 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 

[1] https://ceph.io/community/new-mimic-simplified-rbd-image-cloning/ 

-- 
Jason 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Mirror Snapshot stuck

2021-01-21 Thread Adam Boyhan
Decided to request a resync to see the results, I have a very aggressive 
snapshot mirror schedule of 5 minutes, replication just keeps starting on the 
latest snapshot before it finishes. Pretty sure this would just loop over and 
over if I don't remove the schedule. 

root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
10082 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[]) 
10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 
10883 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.7242f4d1-5203-4273-8b6d-ff4e1411216d
 2 TiB Thu Jan 21 08:50:08 2021 mirror (primary peer_uuids:[]) 
10923 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.d0c3c2e7-880b-4e62-90cc-fd501e9a87c9
 2 TiB Thu Jan 21 08:55:11 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 
10963 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.655f7c17-2f85-42e5-9ffe-777a8a48dda3
 2 TiB Thu Jan 21 09:00:09 2021 mirror (primary peer_uuids:[]) 
10993 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.268b960c-51e9-4a60-99b4-c5e7c303fdd8
 2 TiB Thu Jan 21 09:05:25 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

I have removed the 5 minute schedule for now, but I don't think this should be 
expected behavior? 


From: "adamb"  
To: "ceph-users"  
Sent: Thursday, January 21, 2021 7:40:01 AM 
Subject: [ceph-users] RBD-Mirror Mirror Snapshot stuck 

I have a rbd-mirror snapshot on 1 image that failed to replicate and now its 
not getting cleaned up. 

The cause of this was my fault based on my steps. Just trying to understand how 
to clean up/handle the situation. 

Here is how I got into this situation. 

- Created manual rbd snapshot on the image 
- On the remote cluster I cloned the snapshot 
- While cloned on the secondary cluster I made the mistake of deleting the 
snapshot on the primary 
- The subsequent mirror snapshot failed 
- I then removed the clone 
- The next mirror snapshot was successful but I was left with this mirror 
snapshot on the primary that I can't seem to get rid of 

root@Ccscephtest1:/var/log/ceph# rbd snap ls --all CephTestPool1/vm-100-disk-0 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
10082 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
 2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[]) 
10243 
.mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.483e55aa-2f64-4bb0-ac0f-7b5aac59830e
 2 TiB Thu Jan 21 07:30:08 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

I have tried deleting the snap with "rbd snap rm" like normal user created 
snaps, but no luck. Anyway to force the deletion? 

___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large rbd

2021-01-21 Thread huxia...@horebdata.cn
Does Ceph now supports volume group of RBDs?  From which version if any?

regards,

samuel



huxia...@horebdata.cn
 
From: Robert Sander
Date: 2021-01-21 10:57
To: ceph-users
Subject: [ceph-users] Re: Large rbd
Hi,
 
Am 21.01.21 um 05:42 schrieb Chris Dunlop:
 
> Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
> it just "this is crazy large, if you're trying to go over this you're
> doing something wrong, rethink your life..."?
 
IMHO the limit is there because of the way deletion of RBDs work. "rbd
rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.
 
> Rather than a single large rbd, should I be looking at multiple smaller
> rbds linked together using lvm or somesuch? What are the tradeoffs?
 
IMHO there are no tradeoffs, there could even be benefits creating a
volume group with multiple physical volumes on RBD as the requests can
be bettere parallelized (i.e. virtio-single SCSI controller for qemu).
 
Regards
-- 
Robert Sander
Heinlein Support GmbH
Schwedter Str. 8/9b, 10119 Berlin
 
http://www.heinlein-support.de
 
Tel: 030 / 405051-43
Fax: 030 / 405051-19
 
Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
 
 
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Jason Dillaman
On Thu, Jan 21, 2021 at 8:34 AM Adam Boyhan  wrote:
>
> When cloning the snapshot on the remote cluster I can't see my ext4 
> filesystem.
>
> Using the same exact snapshot on both sides.  Shouldn't this be consistent?

Yes. Has the replication process completed ("rbd mirror image status
CephTestPool1/vm-100-disk-0")?

> Primary Site
> root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> TestSnapper1
>  10621  TestSnapper1  
>  2 TiB Thu Jan 21 08:15:22 2021  user
>
> root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE
> root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
> --keyring /etc/ceph/ceph.client.admin.keyring
> /dev/nbd0
> root@Ccscephtest1:~# mount /dev/nbd0 /usr2
>
> Secondary Site
> root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> TestSnapper1
>  10430  TestSnapper1  
>  2 TiB Thu Jan 21 08:20:08 2021  user
>
> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> admin --keyring /etc/ceph/ceph.client.admin.keyring
> /dev/nbd0
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2
> mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
> codepage or helper program, or other error.
>
>
>
> 
> From: "adamb" 
> To: "dillaman" 
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder" 
> Sent: Wednesday, January 20, 2021 3:42:46 PM
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses
>
> Awesome information.  I new I had to be missing something.
>
> All of my clients will be far newer than mimic so I don't think that will be 
> an issue.
>
> Added the following to my ceph.conf on both clusters.
>
> rbd_default_clone_format = 2
>
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE
> root@Bunkcephmon2:~# rbd ls CephTestPool2
> vm-100-disk-0-CLONE
>
> I am sure I will be back with more questions.  Hoping to replace our Nimble 
> storage with Ceph and NVMe.
>
> Appreciate it!
>
> 
> From: "Jason Dillaman" 
> To: "adamb" 
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder" 
> Sent: Wednesday, January 20, 2021 3:28:39 PM
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses
>
> On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote:
> >
> > That's what I though as well, specially based on this.
> >
> >
> >
> > Note
> >
> > You may clone a snapshot from one pool to an image in another pool. For 
> > example, you may maintain read-only images and snapshots as templates in 
> > one pool, and writeable clones in another pool.
> >
> > root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool2/vm-100-disk-0-CLONE
> > 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> > 0x55c7cf8417f0 validate_parent: parent snapshot must be protected
> >
> > root@Bunkcephmon2:~# rbd snap protect 
> > CephTestPool1/vm-100-disk-0@TestSnapper1
> > rbd: protecting snap failed: (30) Read-only file system
>
> You have two options: (1) protect the snapshot on the primary image so
> that the protection status replicates or (2) utilize RBD clone v2
> which doesn't require protection but does require Mimic or later
> clients [1].
>
> >
> > From: "Eugen Block" 
> > To: "adamb" 
> > Cc: "ceph-users" , "Matt Wilder" 
> > 
> > Sent: Wednesday, January 20, 2021 3:00:54 PM
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses
> >
> > But you should be able to clone the mirrored snapshot on the remote
> > cluster even though it’s not protected, IIRC.
> >
> >
> > Zitat von Adam Boyhan :
> >
> > > Two separate 4 node clusters with 10 OSD's in each node. Micron 9300
> > > NVMe's are the OSD drives. Heavily based on the Micron/Supermicro
> > > white papers.
> > >
> > > When I attempt to protect the snapshot on a remote image, it errors
> > > with read only.
> > >
> > > root@Bunkcephmon2:~# rbd snap protect
> > > CephTestPool1/vm-100-disk-0@TestSnapper1
> > > rbd: protecting snap failed: (30) Read-only file system
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> [1] https://ceph.io/community/new-mimic-simplified-rbd-image-cloning/
>
> --
> Jason
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to cep

[ceph-users] Re: RBD-Mirror Mirror Snapshot stuck

2021-01-21 Thread Jason Dillaman
We actually have a bunch of bug fixes for snapshot-based mirroring
pending for the next Octopus release. I think this stuck snapshot case
has been fixed, but I'll try to verify on the pacific branch to
ensure.

On Thu, Jan 21, 2021 at 9:11 AM Adam Boyhan  wrote:
>
> Decided to request a resync to see the results, I have a very aggressive 
> snapshot mirror schedule of 5 minutes, replication just keeps starting on the 
> latest snapshot before it finishes. Pretty sure this would just loop over and 
> over if I don't remove the schedule.
>
> root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE
> 10082 
> .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
>  2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[])
> 10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user
> 10883 
> .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.7242f4d1-5203-4273-8b6d-ff4e1411216d
>  2 TiB Thu Jan 21 08:50:08 2021 mirror (primary peer_uuids:[])
> 10923 
> .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.d0c3c2e7-880b-4e62-90cc-fd501e9a87c9
>  2 TiB Thu Jan 21 08:55:11 2021 mirror (primary 
> peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770])
> 10963 
> .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.655f7c17-2f85-42e5-9ffe-777a8a48dda3
>  2 TiB Thu Jan 21 09:00:09 2021 mirror (primary peer_uuids:[])
> 10993 
> .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.268b960c-51e9-4a60-99b4-c5e7c303fdd8
>  2 TiB Thu Jan 21 09:05:25 2021 mirror (primary 
> peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770])
>
> I have removed the 5 minute schedule for now, but I don't think this should 
> be expected behavior?
>
>
> From: "adamb" 
> To: "ceph-users" 
> Sent: Thursday, January 21, 2021 7:40:01 AM
> Subject: [ceph-users] RBD-Mirror Mirror Snapshot stuck
>
> I have a rbd-mirror snapshot on 1 image that failed to replicate and now its 
> not getting cleaned up.
>
> The cause of this was my fault based on my steps. Just trying to understand 
> how to clean up/handle the situation.
>
> Here is how I got into this situation.
>
> - Created manual rbd snapshot on the image
> - On the remote cluster I cloned the snapshot
> - While cloned on the secondary cluster I made the mistake of deleting the 
> snapshot on the primary
> - The subsequent mirror snapshot failed
> - I then removed the clone
> - The next mirror snapshot was successful but I was left with this mirror 
> snapshot on the primary that I can't seem to get rid of
>
> root@Ccscephtest1:/var/log/ceph# rbd snap ls --all CephTestPool1/vm-100-disk-0
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE
> 10082 
> .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.e0c63479-b09e-4c66-a65b-085b67a19907
>  2 TiB Thu Jan 21 07:10:09 2021 mirror (primary peer_uuids:[])
> 10243 
> .mirror.primary.90c53c21-6951-4218-9f07-9e983d490993.483e55aa-2f64-4bb0-ac0f-7b5aac59830e
>  2 TiB Thu Jan 21 07:30:08 2021 mirror (primary 
> peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770])
>
> I have tried deleting the snap with "rbd snap rm" like normal user created 
> snaps, but no luck. Anyway to force the deletion?
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large rbd

2021-01-21 Thread Loris Cuoghi

Hi,

I think what's being suggested here is to create a good old LVM VG in a 
virtualized guest, from multiple RBDs, each accessed as a separate 
VirtIO SCSI device.


As each storage device in the LVM VG has its own queues at the VirtIO / 
QEMU / RBD interface levels, that would allow for greater parallel 
performance.


Loris Cuoghi


On 21/01/21 15:15, huxia...@horebdata.cn wrote:

Does Ceph now supports volume group of RBDs?  From which version if any?

regards,

samuel



huxia...@horebdata.cn
  
From: Robert Sander

Date: 2021-01-21 10:57
To: ceph-users
Subject: [ceph-users] Re: Large rbd
Hi,
  
Am 21.01.21 um 05:42 schrieb Chris Dunlop:
  

Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
it just "this is crazy large, if you're trying to go over this you're
doing something wrong, rethink your life..."?
  
IMHO the limit is there because of the way deletion of RBDs work. "rbd

rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.
  

Rather than a single large rbd, should I be looking at multiple smaller
rbds linked together using lvm or somesuch? What are the tradeoffs?
  
IMHO there are no tradeoffs, there could even be benefits creating a

volume group with multiple physical volumes on RBD as the requests can
be bettere parallelized (i.e. virtio-single SCSI controller for qemu).
  
Regards

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
After the resync finished. I can mount it now. 

root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
CephTestPool1/vm-100-disk-0-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 

Makes me a bit nervous how it got into that position and everything appeared 
ok. 


From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Thursday, January 21, 2021 9:25:11 AM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 21, 2021 at 8:34 AM Adam Boyhan  wrote: 
> 
> When cloning the snapshot on the remote cluster I can't see my ext4 
> filesystem. 
> 
> Using the same exact snapshot on both sides. Shouldn't this be consistent? 

Yes. Has the replication process completed ("rbd mirror image status 
CephTestPool1/vm-100-disk-0")? 

> Primary Site 
> root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> TestSnapper1 
> 10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 
> 
> root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE 
> root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id admin 
> --keyring /etc/ceph/ceph.client.admin.keyring 
> /dev/nbd0 
> root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
> 
> Secondary Site 
> root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> TestSnapper1 
> 10430 TestSnapper1 2 TiB Thu Jan 21 08:20:08 2021 user 
> 
> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE 
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> admin --keyring /etc/ceph/ceph.client.admin.keyring 
> /dev/nbd0 
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
> mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
> codepage or helper program, or other error. 
> 
> 
> 
>  
> From: "adamb"  
> To: "dillaman"  
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder"  
> Sent: Wednesday, January 20, 2021 3:42:46 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> Awesome information. I new I had to be missing something. 
> 
> All of my clients will be far newer than mimic so I don't think that will be 
> an issue. 
> 
> Added the following to my ceph.conf on both clusters. 
> 
> rbd_default_clone_format = 2 
> 
> root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool2/vm-100-disk-0-CLONE 
> root@Bunkcephmon2:~# rbd ls CephTestPool2 
> vm-100-disk-0-CLONE 
> 
> I am sure I will be back with more questions. Hoping to replace our Nimble 
> storage with Ceph and NVMe. 
> 
> Appreciate it! 
> 
>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder"  
> Sent: Wednesday, January 20, 2021 3:28:39 PM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote: 
> > 
> > That's what I though as well, specially based on this. 
> > 
> > 
> > 
> > Note 
> > 
> > You may clone a snapshot from one pool to an image in another pool. For 
> > example, you may maintain read-only images and snapshots as templates in 
> > one pool, and writeable clones in another pool. 
> > 
> > root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool2/vm-100-disk-0-CLONE 
> > 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> > 0x55c7cf8417f0 validate_parent: parent snapshot must be protected 
> > 
> > root@Bunkcephmon2:~# rbd snap protect 
> > CephTestPool1/vm-100-disk-0@TestSnapper1 
> > rbd: protecting snap failed: (30) Read-only file system 
> 
> You have two options: (1) protect the snapshot on the primary image so 
> that the protection status replicates or (2) utilize RBD clone v2 
> which doesn't require protection but does require Mimic or later 
> clients [1]. 
> 
> > 
> > From: "Eugen Block"  
> > To: "adamb"  
> > Cc: "ceph-users" , "Matt Wilder" 
> >  
> > Sent: Wednesday, January 20, 2021 3:00:54 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > But you should be able to clone the mirrored snapshot on the remote 
> > cluster even though it’s not protected, IIRC. 
> > 
> > 
> > Zitat von Adam Boyhan : 
> > 
> > > Two separate 4 node clusters with 10 OSD's in each node. Micron 9300 
> > > NVMe's are the OSD drives. Heavily based on the Micron/Supermicro 
> > > white papers. 
> > > 
> > > When I attempt to protect the snapshot on a remote image, it errors 
> > > with read only. 
> > > 
> > > root@Bunkcephmon2:~# rbd snap protect 
> > > CephTestPool1/vm-100-disk-0@TestSnapper1 
> > > rbd: protecting snap failed: (30) Read-only file system 
> > > __

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Jason Dillaman
On Thu, Jan 21, 2021 at 9:40 AM Adam Boyhan  wrote:
>
> After the resync finished.  I can mount it now.
>
> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> admin --keyring /etc/ceph/ceph.client.admin.keyring
> /dev/nbd0
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2
>
> Makes me a bit nervous how it got into that position and everything appeared 
> ok.

We unfortunately need to create the snapshots that are being synced as
a first step, but perhaps there are some extra guardrails we can put
on the system to prevent premature usage if the sync status doesn't
indicate that it's complete.

> 
> From: "Jason Dillaman" 
> To: "adamb" 
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder" 
> Sent: Thursday, January 21, 2021 9:25:11 AM
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses
>
> On Thu, Jan 21, 2021 at 8:34 AM Adam Boyhan  wrote:
> >
> > When cloning the snapshot on the remote cluster I can't see my ext4 
> > filesystem.
> >
> > Using the same exact snapshot on both sides.  Shouldn't this be consistent?
>
> Yes. Has the replication process completed ("rbd mirror image status
> CephTestPool1/vm-100-disk-0")?
>
> > Primary Site
> > root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> > TestSnapper1
> >  10621  TestSnapper1
> >2 TiB Thu Jan 21 08:15:22 2021  user
> >
> > root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool1/vm-100-disk-0-CLONE
> > root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> > admin --keyring /etc/ceph/ceph.client.admin.keyring
> > /dev/nbd0
> > root@Ccscephtest1:~# mount /dev/nbd0 /usr2
> >
> > Secondary Site
> > root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> > TestSnapper1
> >  10430  TestSnapper1
> >2 TiB Thu Jan 21 08:20:08 2021  user
> >
> > root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool1/vm-100-disk-0-CLONE
> > root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> > admin --keyring /etc/ceph/ceph.client.admin.keyring
> > /dev/nbd0
> > root@Bunkcephtest1:~# mount /dev/nbd0 /usr2
> > mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, 
> > missing codepage or helper program, or other error.
> >
> >
> >
> > 
> > From: "adamb" 
> > To: "dillaman" 
> > Cc: "Eugen Block" , "ceph-users" , "Matt 
> > Wilder" 
> > Sent: Wednesday, January 20, 2021 3:42:46 PM
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses
> >
> > Awesome information.  I new I had to be missing something.
> >
> > All of my clients will be far newer than mimic so I don't think that will 
> > be an issue.
> >
> > Added the following to my ceph.conf on both clusters.
> >
> > rbd_default_clone_format = 2
> >
> > root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool2/vm-100-disk-0-CLONE
> > root@Bunkcephmon2:~# rbd ls CephTestPool2
> > vm-100-disk-0-CLONE
> >
> > I am sure I will be back with more questions.  Hoping to replace our Nimble 
> > storage with Ceph and NVMe.
> >
> > Appreciate it!
> >
> > 
> > From: "Jason Dillaman" 
> > To: "adamb" 
> > Cc: "Eugen Block" , "ceph-users" , "Matt 
> > Wilder" 
> > Sent: Wednesday, January 20, 2021 3:28:39 PM
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses
> >
> > On Wed, Jan 20, 2021 at 3:10 PM Adam Boyhan  wrote:
> > >
> > > That's what I though as well, specially based on this.
> > >
> > >
> > >
> > > Note
> > >
> > > You may clone a snapshot from one pool to an image in another pool. For 
> > > example, you may maintain read-only images and snapshots as templates in 
> > > one pool, and writeable clones in another pool.
> > >
> > > root@Bunkcephmon2:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > > CephTestPool2/vm-100-disk-0-CLONE
> > > 2021-01-20T15:06:35.854-0500 7fb889ffb700 -1 librbd::image::CloneRequest: 
> > > 0x55c7cf8417f0 validate_parent: parent snapshot must be protected
> > >
> > > root@Bunkcephmon2:~# rbd snap protect 
> > > CephTestPool1/vm-100-disk-0@TestSnapper1
> > > rbd: protecting snap failed: (30) Read-only file system
> >
> > You have two options: (1) protect the snapshot on the primary image so
> > that the protection status replicates or (2) utilize RBD clone v2
> > which doesn't require protection but does require Mimic or later
> > clients [1].
> >
> > >
> > > From: "Eugen Block" 
> > > To: "adamb" 
> > > Cc: "ceph-users" , "Matt Wilder" 
> > > 
> > > Sent: Wednesday, January 20, 2021 3:00:54 PM
> > > Subject: Re: [ceph-users] Re: 

[ceph-users] Re: Large rbd

2021-01-21 Thread John Petrini
I've always been curious about this. Does anyone have any experience
spanning an LVM VG over multiple RBDs?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm db_slots and wal_slots ignored

2021-01-21 Thread Schweiss, Chip
That worked!  Thanks!

Now to figure out how to correct all the incorrect OSDs.



On Thu, Jan 21, 2021 at 1:29 AM Eugen Block  wrote:

> If you use block_db_size and limit in your yaml file, e.g.
>
> block_db_size: 64G  (or whatever you choose)
> limit: 6
>
> this should not consume the entire disk but only as much as you
> configured. Can you try if that works for you?
>
>
> Zitat von "Schweiss, Chip" :
>
> > I'm trying to set up a new ceph cluster with cephadm on a SUSE SES trial
> > that has Ceph 15.2.8
> >
> > Each OSD node has 18 rotational SAS disks, 4 NVMe 2TB SSDs for DB, and 2
> > NVME2 200GB Optane SSDs for WAL.
> >
> > These servers will eventually have 24 rotational SAS disks that they will
> > inherit from existing storage servers.  So I don't want all the space
> used
> > on the DB and WAL SSDs.
> >
> > I suspect from the comment "(db_slots is actually to be favoured here,
> but
> > it's not implemented yet)" on this page in the docs:
> > https://docs.ceph.com/en/latest/cephadm/drivegroups/#the-advanced-case
> these
> > parameters are not yet implemented, yet are documented as such under
> > "ADDITIONAL OPTIONS"
> >
> > My osd_spec.yml:
> > service_type: osd
> > service_id: three_tier_osd
> > placement:
> >   host_pattern: '*'
> > data_devices:
> >   rotational: 1
> >   model: 'ST14000NM0288'
> > db_devices:
> >   rotational: 0
> >   model: 'INTEL SSDPE2KX020T8'
> >   limit: 6
> > wal_devices:
> >   model: 'INTEL SSDPEL1K200GA'
> >   limit: 12
> > db_slots: 6
> > wal_slots: 12
> >
> > All available space is consumed on my DB and WAL SSDs with only 18 OSDs,
> > leaving no room to add additional spindles.
> >
> > Is this still work in progress, or a bug I should report?  Possibly
> related
> > to https://github.com/rook/rook/issues/5026  At the minimum, this
> appears
> > to be a documentation bug.
> >
> > How can I work around this?
> >
> > -Chip
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large rbd

2021-01-21 Thread Alex Gorbachev
On Thu, Jan 21, 2021 at 9:47 AM John Petrini  wrote:

> I've always been curious about this. Does anyone have any experience
> spanning an LVM VG over multiple RBDs?
>

I do on RHEL, it works very well.  Each RDB device has some inherent IO
limitations, but using multiple in parallel works quite well.  We never
were able to use fancy striping for this purpose, but LVM is simple, well
tested, and does not require any extra considerations.  We currently just
use spanned volumes, not striped (although this would be interesting), and
allow parallelization to take place via database itself - the parallel IOs
are not necessarily aligned on RBD boundaries, but for our purposes, and
using all SSD OSDs this is enough.

Alex Gorbachev
iss-integration.com


> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large rbd

2021-01-21 Thread David Majchrzak, ODERLAND Webbhotell AB
We do it in production, haven't benchmarked it though if that's what 
you're aiming for. General consensus when we started with it was that it 
allowed for greater performance (we use librbd kvm).


--

David Majchrzak
CTO
Oderland Webbhotell AB
Östra Hamngatan 50B, 411 09 Göteborg, SWEDEN

Den 2021-01-21 kl. 15:46, skrev John Petrini:

I've always been curious about this. Does anyone have any experience
spanning an LVM VG over multiple RBDs?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mds openfiles table shards

2021-01-21 Thread Dan van der Ster
Hi all,

During rejoin an MDS can sometimes go OOM if the openfiles table is too large.
The workaround has been described by ceph devs as "rados rm -p
cephfs_metadata mds0_openfiles.0".

On our cluster we have several such objects for rank 0:

mds0_openfiles.0 exists with size: 199978
mds0_openfiles.1 exists with size: 153650
mds0_openfiles.2 exists with size: 40987
mds0_openfiles.3 exists with size: 7746
mds0_openfiles.4 exists with size: 413

If we suffer such an OOM, do we need to rm *all* of those objects or
only the `.0` object?

Best Regards,

Dan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD-Mirror Snapshot Scalability

2021-01-21 Thread Adam Boyhan
I have noticed that RBD-Mirror snapshot mode can only manage to take 1 snapshot 
per second. For example I have 21 images in a single pool. When the schedule is 
triggered it takes the mirror snapshot of each image 1 at a time. It doesn't 
feel or look like a performance issue as the OSD's are Micron 9300 PRO NVMe's 
and each server has 2x Intel Platinum 8268 CPU's. 

I was hoping that adding more RDB-Mirror instance would help, but that only 
seems to help with overall throughput. As it sits I have 3 RBD-Mirror instances 
running on each cluster. 

We run a 30 minute snapshot schedule to our remote site as it is, based on that 
I can only squeeze 1800 mirror snaps every 30 minutes. 

I was hoping there might be something I am missing with RBD-Mirror as far as 
scaling goes. 

Maybe multiple pools would be a solution and have other benefits? 




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
I was able to trigger the issue again. 

- On the primary I created a snap called TestSnapper for disk vm-100-disk-1 
- Allowed the next RBD-Mirror scheduled snap to complete 
- At this point the snapshot is showing up on the remote side. 

root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
vm-100-disk-1: 
global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
state: up+replaying 
description: replaying, 
{"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611247200,"remote_snapshot_timestamp":1611247200,"replay_state":"idle"}
 
service: admin on Bunkcephmon1 
last_update: 2021-01-21 11:46:24 
peer_sites: 
name: ccs 
state: up+stopped 
description: local image is primary 
last_update: 2021-01-21 11:46:28 

root@Ccscephtest1:/etc/pve/priv# rbd snap ls --all CephTestPool1/vm-100-disk-1 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
11532 TestSnapper 2 TiB Thu Jan 21 11:21:25 2021 user 
11573 
.mirror.primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.9525e4eb-41c0-499c-8879-0c7d9576e253
 2 TiB Thu Jan 21 11:35:00 2021 mirror (primary 
peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 

Seems like the sync is complete, So I then clone it, map it and attempt to 
mount it. 

root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper 
CephTestPool1/vm-100-disk-1-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE --id admin 
--keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
codepage or helper program, or other error. 

On the primary still no issues 

root@Ccscephtest1:/etc/pve/priv# rbd clone 
CephTestPool1/vm-100-disk-1@TestSnapper CephTestPool1/vm-100-disk-1-CLONE 
root@Ccscephtest1:/etc/pve/priv# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
--id admin --keyring /etc/ceph/ceph.client.admin.keyring 
/dev/nbd0 
root@Ccscephtest1:/etc/pve/priv# mount /dev/nbd0 /usr2 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Thursday, January 21, 2021 9:42:26 AM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 21, 2021 at 9:40 AM Adam Boyhan  wrote: 
> 
> After the resync finished. I can mount it now. 
> 
> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> CephTestPool1/vm-100-disk-0-CLONE 
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> admin --keyring /etc/ceph/ceph.client.admin.keyring 
> /dev/nbd0 
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
> 
> Makes me a bit nervous how it got into that position and everything appeared 
> ok. 

We unfortunately need to create the snapshots that are being synced as 
a first step, but perhaps there are some extra guardrails we can put 
on the system to prevent premature usage if the sync status doesn't 
indicate that it's complete. 

>  
> From: "Jason Dillaman"  
> To: "adamb"  
> Cc: "Eugen Block" , "ceph-users" , "Matt 
> Wilder"  
> Sent: Thursday, January 21, 2021 9:25:11 AM 
> Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> 
> On Thu, Jan 21, 2021 at 8:34 AM Adam Boyhan  wrote: 
> > 
> > When cloning the snapshot on the remote cluster I can't see my ext4 
> > filesystem. 
> > 
> > Using the same exact snapshot on both sides. Shouldn't this be consistent? 
> 
> Yes. Has the replication process completed ("rbd mirror image status 
> CephTestPool1/vm-100-disk-0")? 
> 
> > Primary Site 
> > root@Ccscephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> > TestSnapper1 
> > 10621 TestSnapper1 2 TiB Thu Jan 21 08:15:22 2021 user 
> > 
> > root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool1/vm-100-disk-0-CLONE 
> > root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> > admin --keyring /etc/ceph/ceph.client.admin.keyring 
> > /dev/nbd0 
> > root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
> > 
> > Secondary Site 
> > root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-0 | grep 
> > TestSnapper1 
> > 10430 TestSnapper1 2 TiB Thu Jan 21 08:20:08 2021 user 
> > 
> > root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-0@TestSnapper1 
> > CephTestPool1/vm-100-disk-0-CLONE 
> > root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-0-CLONE --id 
> > admin --keyring /etc/ceph/ceph.client.admin.keyring 
> > /dev/nbd0 
> > root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
> > mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, 
> > missing codepage or helper program, or other error. 
> > 
> > 
> > 
> >  
> > From: "adamb"  
> > To: "dillaman"  
> > Cc: "Eugen Block" , "ceph-users" , "Matt 
> > Wilder"  
> > Sent: Wednesday, January 20, 2021 3:42:46 PM 
> > Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 
> > 
> > Awesome information. 

[ceph-users] Re: RBD-Mirror Snapshot Scalability

2021-01-21 Thread Adam Boyhan
Looks like a script and cron will be a solid work around. 

Still interested to know if there are any options to make it so rbd-mirror can 
take more than 1 mirror snap per second. 



From: "adamb"  
To: "ceph-users"  
Sent: Thursday, January 21, 2021 11:18:36 AM 
Subject: [ceph-users] RBD-Mirror Snapshot Scalability 

I have noticed that RBD-Mirror snapshot mode can only manage to take 1 snapshot 
per second. For example I have 21 images in a single pool. When the schedule is 
triggered it takes the mirror snapshot of each image 1 at a time. It doesn't 
feel or look like a performance issue as the OSD's are Micron 9300 PRO NVMe's 
and each server has 2x Intel Platinum 8268 CPU's. 

I was hoping that adding more RDB-Mirror instance would help, but that only 
seems to help with overall throughput. As it sits I have 3 RBD-Mirror instances 
running on each cluster. 

We run a 30 minute snapshot schedule to our remote site as it is, based on that 
I can only squeeze 1800 mirror snaps every 30 minutes. 

I was hoping there might be something I am missing with RBD-Mirror as far as 
scaling goes. 

Maybe multiple pools would be a solution and have other benefits? 




___ 
ceph-users mailing list -- ceph-users@ceph.io 
To unsubscribe send an email to ceph-users-le...@ceph.io 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Scalability

2021-01-21 Thread Jason Dillaman
On Thu, Jan 21, 2021 at 2:00 PM Adam Boyhan  wrote:
>
> Looks like a script and cron will be a solid work around.
>
> Still interested to know if there are any options to make it so rbd-mirror 
> can take more than 1 mirror snap per second.
>
>
>
> From: "adamb" 
> To: "ceph-users" 
> Sent: Thursday, January 21, 2021 11:18:36 AM
> Subject: [ceph-users] RBD-Mirror Snapshot Scalability
>
> I have noticed that RBD-Mirror snapshot mode can only manage to take 1 
> snapshot per second. For example I have 21 images in a single pool. When the 
> schedule is triggered it takes the mirror snapshot of each image 1 at a time. 
> It doesn't feel or look like a performance issue as the OSD's are Micron 9300 
> PRO NVMe's and each server has 2x Intel Platinum 8268 CPU's.

The creation of snapshot ids is limited by the MONs quorum process. It
can issue multiple ids in a single batch, but they all need to be
queued up. The most recent version of the MGR's RBD mirror snapshot
scheduler works asynchronously so it can start multiple snapshots
concurrently. It's much better but it still won't scale to hundreds of
snapshots per second (*not that your cluster could even keep up
regardless even if the MONs could).

> I was hoping that adding more RDB-Mirror instance would help, but that only 
> seems to help with overall throughput. As it sits I have 3 RBD-Mirror 
> instances running on each cluster.
>
> We run a 30 minute snapshot schedule to our remote site as it is, based on 
> that I can only squeeze 1800 mirror snaps every 30 minutes.

Honestly, you might be at the bleeding edge here with an attempt to
replicate >1,800 images. Getting feedback from deployments like yours
can help us improve the software since we, realistically, don't have
the compute resources to easily test at large scale.

> I was hoping there might be something I am missing with RBD-Mirror as far as 
> scaling goes.
>
> Maybe multiple pools would be a solution and have other benefits?
>
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Scalability

2021-01-21 Thread Adam Boyhan
Let me just start off by saying, I really appreciate all your input so far. Its 
been a huge help! 

Even if it can scale to 10-20 per second that would make things far more 
viable. Sounds like it shouldn't be much of a issue with the changes you 
mentioned. 

As it sits we have roughly 1300 (and growing) LVM Logical Volumes that we would 
like to move over to Ceph and replicate. These are currently running on one 
large Nimble HPE volume. 

For testing I currently have 2x of the OSD nodes I plan on using in production. 

SuperMicro SYS-1029U-TN10RT 
2x Platinum 8268 2.9Ghz CPU 
10x Micron 9300 PRO 15.36TB Drives 
386G Ram 
1x Micron 5300 PRO 960G for OS 

I virtualized each "sites" cluster using Proxmox. Proxmox is also the OS of the 
OSD/MON VM's. 

The full production setup will have a dedicated 100G network for the private 
network and a dedicated 10G network for the public network. 

If things go well with this testing setup, I hope to have the production 
hardware by June. Possibly moving LV's to RBD's by next year at this time. 

I am ready/eager to help the Ceph project in anyway I can. 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "ceph-users"  
Sent: Thursday, January 21, 2021 2:18:06 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Scalability 

On Thu, Jan 21, 2021 at 2:00 PM Adam Boyhan  wrote: 
> 
> Looks like a script and cron will be a solid work around. 
> 
> Still interested to know if there are any options to make it so rbd-mirror 
> can take more than 1 mirror snap per second. 
> 
> 
> 
> From: "adamb"  
> To: "ceph-users"  
> Sent: Thursday, January 21, 2021 11:18:36 AM 
> Subject: [ceph-users] RBD-Mirror Snapshot Scalability 
> 
> I have noticed that RBD-Mirror snapshot mode can only manage to take 1 
> snapshot per second. For example I have 21 images in a single pool. When the 
> schedule is triggered it takes the mirror snapshot of each image 1 at a time. 
> It doesn't feel or look like a performance issue as the OSD's are Micron 9300 
> PRO NVMe's and each server has 2x Intel Platinum 8268 CPU's. 

The creation of snapshot ids is limited by the MONs quorum process. It 
can issue multiple ids in a single batch, but they all need to be 
queued up. The most recent version of the MGR's RBD mirror snapshot 
scheduler works asynchronously so it can start multiple snapshots 
concurrently. It's much better but it still won't scale to hundreds of 
snapshots per second (*not that your cluster could even keep up 
regardless even if the MONs could). 

> I was hoping that adding more RDB-Mirror instance would help, but that only 
> seems to help with overall throughput. As it sits I have 3 RBD-Mirror 
> instances running on each cluster. 
> 
> We run a 30 minute snapshot schedule to our remote site as it is, based on 
> that I can only squeeze 1800 mirror snaps every 30 minutes. 

Honestly, you might be at the bleeding edge here with an attempt to 
replicate >1,800 images. Getting feedback from deployments like yours 
can help us improve the software since we, realistically, don't have 
the compute resources to easily test at large scale. 

> I was hoping there might be something I am missing with RBD-Mirror as far as 
> scaling goes. 
> 
> Maybe multiple pools would be a solution and have other benefits? 
> 
> 
> 
> 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> ___ 
> ceph-users mailing list -- ceph-users@ceph.io 
> To unsubscribe send an email to ceph-users-le...@ceph.io 
> 


-- 
Jason 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Jason Dillaman
On Thu, Jan 21, 2021 at 11:51 AM Adam Boyhan  wrote:
>
> I was able to trigger the issue again.
>
> - On the primary I created a snap called TestSnapper for disk vm-100-disk-1
> - Allowed the next RBD-Mirror scheduled snap to complete
> - At this point the snapshot is showing up on the remote side.
>
> root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1
> vm-100-disk-1:
>   global_id:   a04e92df-3d64-4dc4-8ac8-eaba17b45403
>   state:   up+replaying
>   description: replaying, 
> {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611247200,"remote_snapshot_timestamp":1611247200,"replay_state":"idle"}
>   service: admin on Bunkcephmon1
>   last_update: 2021-01-21 11:46:24
>   peer_sites:
> name: ccs
> state: up+stopped
> description: local image is primary
> last_update: 2021-01-21 11:46:28
>
> root@Ccscephtest1:/etc/pve/priv# rbd snap ls --all CephTestPool1/vm-100-disk-1
> SNAPID  NAME  
>  SIZE   PROTECTED  TIMESTAMP NAMESPACE
>  11532  TestSnapper   
>  2 TiB Thu Jan 21 11:21:25 2021  user
>  11573  
> .mirror.primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.9525e4eb-41c0-499c-8879-0c7d9576e253
>   2 TiB Thu Jan 21 11:35:00 2021  mirror (primary 
> peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770])
>
> Seems like the sync is complete, So I then clone it, map it and attempt to 
> mount it.

Can you run "snap ls --all" on the non-primary cluster? The
non-primary snapshot will list its status. On my cluster (with a much
smaller image):

#
# CLUSTER 1
#
$ rbd --cluster cluster1 create --size 1G mirror/image1
$ rbd --cluster cluster1 mirror image enable mirror/image1 snapshot
Mirroring enabled
$ rbd --cluster cluster1 device map -t nbd mirror/image1
/dev/nbd0
$ mkfs.ext4 /dev/nbd0
mke2fs 1.45.5 (07-Jan-2020)
Discarding device blocks: done
Creating filesystem with 262144 4k blocks and 65536 inodes
Filesystem UUID: 50e0da12-1f99-4d45-b6e6-5f7a7decaeff
Superblock backups stored on blocks:
32768, 98304, 163840, 229376

Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
$ blkid /dev/nbd0
/dev/nbd0: UUID="50e0da12-1f99-4d45-b6e6-5f7a7decaeff"
BLOCK_SIZE="4096" TYPE="ext4"
$ rbd --cluster cluster1 snap create mirror/image1@fs
Creating snap: 100% complete...done.
$ rbd --cluster cluster1 mirror image snapshot mirror/image1
Snapshot ID: 6
$ rbd --cluster cluster1 snap ls --all mirror/image1
SNAPID  NAME
SIZE   PROTECTED  TIMESTAMP
 NAMESPACE
 5  fs
1 GiB Thu Jan 21 14:50:24 2021
 user
 6  
.mirror.primary.f9f692b8-2405-416c-9247-5628e303947a.39722e17-f7e6-4050-acf0-3842a5620d81
 1 GiB Thu Jan 21 14:50:51 2021  mirror (primary
peer_uuids:[cd643f30-4982-4caf-874d-cf21f6f4b66f])

#
# CLUSTER 2
#

$ rbd --cluster cluster2 mirror image status mirror/image1
image1:
  global_id:   f9f692b8-2405-416c-9247-5628e303947a
  state:   up+replaying
  description: replaying,
{"bytes_per_second":1140872.53,"bytes_per_snapshot":17113088.0,"local_snapshot_timestamp":1611258651,"remote_snapshot_timestamp":1611258651,"replay_state":"idle"}
  service: mirror.0 on cube-1
  last_update: 2021-01-21 14:51:18
  peer_sites:
name: cluster1
state: up+stopped
description: local image is primary
last_update: 2021-01-21 14:51:27
$ rbd --cluster cluster2 snap ls --all mirror/image1
SNAPID  NAME
SIZE   PROTECTED  TIMESTAMP
 NAMESPACE
 5  fs
1 GiB Thu Jan 21 14:50:52
2021  user
 6  
.mirror.non_primary.f9f692b8-2405-416c-9247-5628e303947a.0a13b822-0508-47d6-a460-a8cc4e012686
 1 GiB Thu Jan 21 14:50:53 2021  mirror (non-primary
peer_uuids:[] 9824df2b-86c4-4264-a47e-cf968efd09e1:6 copied)
$ rbd --cluster cluster2 --rbd-default-clone-format 2 clone
mirror/image1@fs mirror/image2
$ rbd --cluster cluster2 device map -t nbd mirror/image2
/dev/nbd1
$ blkid /dev/nbd1
/dev/nbd1: UUID="50e0da12-1f99-4d45-b6e6-5f7a7decaeff"
BLOCK_SIZE="4096" TYPE="ext4"
$ mount /dev/nbd1 /mnt/
$ mount | grep nbd
/dev/nbd1 on /mnt type ext4 (rw,relatime,seclabel)

> root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper 
> CephTestPool1/vm-100-disk-1-CLONE
> root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE --id 
> admin --keyring /etc/ceph/ceph.client.admin.keyring
> /dev/nbd0
> root@Bunkcephtest1:~# mount /dev/nbd0 /usr2
> mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
> codepage or helper program, or other error.
>
> On the primary still no issues
>
> root@Ccscephtest1:/etc/pve/priv# rbd clone 
> CephTestPool1/vm-100-disk-1@TestSnappe

[ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses

2021-01-21 Thread Adam Boyhan
Sure thing. 

root@Bunkcephtest1:~# rbd snap ls --all CephTestPool1/vm-100-disk-1 
SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
12192 TestSnapper1 2 TiB Thu Jan 21 14:15:02 2021 user 
12595 
.mirror.non_primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.34c4a53e-9525-446c-8de6-409ea93c5edd
 2 TiB Thu Jan 21 15:05:02 2021 mirror (non-primary peer_uuids:[] 
6c26557e-d011-47b1-8c99-34cf6e0c7f2f:12801 copied) 


root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
vm-100-disk-1: 
global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
state: up+replaying 
description: replaying, 
{"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611259501,"remote_snapshot_timestamp":1611259501,"replay_state":"idle"}
 
service: admin on Bunkcephmon1 
last_update: 2021-01-21 15:06:24 
peer_sites: 
name: ccs 
state: up+stopped 
description: local image is primary 
last_update: 2021-01-21 15:06:23 


root@Bunkcephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
CephTestPool1/vm-100-disk-1-CLONE 
root@Bunkcephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
/dev/nbd0 
root@Bunkcephtest1:~# blkid /dev/nbd0 
root@Bunkcephtest1:~# mount /dev/nbd0 /usr2 
mount: /usr2: wrong fs type, bad option, bad superblock on /dev/nbd0, missing 
codepage or helper program, or other error. 


Primary still looks good. 

root@Ccscephtest1:~# rbd clone CephTestPool1/vm-100-disk-1@TestSnapper1 
CephTestPool1/vm-100-disk-1-CLONE 
root@Ccscephtest1:~# rbd-nbd map CephTestPool1/vm-100-disk-1-CLONE 
/dev/nbd0 
root@Ccscephtest1:~# blkid /dev/nbd0 
/dev/nbd0: UUID="830b8e05-d5c1-481d-896d-14e21d17017d" TYPE="ext4" 
root@Ccscephtest1:~# mount /dev/nbd0 /usr2 
root@Ccscephtest1:~# cat /proc/mounts | grep nbd0 
/dev/nbd0 /usr2 ext4 rw,relatime 0 0 






From: "Jason Dillaman"  
To: "adamb"  
Cc: "Eugen Block" , "ceph-users" , "Matt 
Wilder"  
Sent: Thursday, January 21, 2021 3:01:46 PM 
Subject: Re: [ceph-users] Re: RBD-Mirror Snapshot Backup Image Uses 

On Thu, Jan 21, 2021 at 11:51 AM Adam Boyhan  wrote: 
> 
> I was able to trigger the issue again. 
> 
> - On the primary I created a snap called TestSnapper for disk vm-100-disk-1 
> - Allowed the next RBD-Mirror scheduled snap to complete 
> - At this point the snapshot is showing up on the remote side. 
> 
> root@Bunkcephtest1:~# rbd mirror image status CephTestPool1/vm-100-disk-1 
> vm-100-disk-1: 
> global_id: a04e92df-3d64-4dc4-8ac8-eaba17b45403 
> state: up+replaying 
> description: replaying, 
> {"bytes_per_second":0.0,"bytes_per_snapshot":0.0,"local_snapshot_timestamp":1611247200,"remote_snapshot_timestamp":1611247200,"replay_state":"idle"}
>  
> service: admin on Bunkcephmon1 
> last_update: 2021-01-21 11:46:24 
> peer_sites: 
> name: ccs 
> state: up+stopped 
> description: local image is primary 
> last_update: 2021-01-21 11:46:28 
> 
> root@Ccscephtest1:/etc/pve/priv# rbd snap ls --all 
> CephTestPool1/vm-100-disk-1 
> SNAPID NAME SIZE PROTECTED TIMESTAMP NAMESPACE 
> 11532 TestSnapper 2 TiB Thu Jan 21 11:21:25 2021 user 
> 11573 
> .mirror.primary.a04e92df-3d64-4dc4-8ac8-eaba17b45403.9525e4eb-41c0-499c-8879-0c7d9576e253
>  2 TiB Thu Jan 21 11:35:00 2021 mirror (primary 
> peer_uuids:[debf975b-ebb8-432c-a94a-d3b101e0f770]) 
> 
> Seems like the sync is complete, So I then clone it, map it and attempt to 
> mount it. 

Can you run "snap ls --all" on the non-primary cluster? The 
non-primary snapshot will list its status. On my cluster (with a much 
smaller image): 

# 
# CLUSTER 1 
# 
$ rbd --cluster cluster1 create --size 1G mirror/image1 
$ rbd --cluster cluster1 mirror image enable mirror/image1 snapshot 
Mirroring enabled 
$ rbd --cluster cluster1 device map -t nbd mirror/image1 
/dev/nbd0 
$ mkfs.ext4 /dev/nbd0 
mke2fs 1.45.5 (07-Jan-2020) 
Discarding device blocks: done 
Creating filesystem with 262144 4k blocks and 65536 inodes 
Filesystem UUID: 50e0da12-1f99-4d45-b6e6-5f7a7decaeff 
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376 

Allocating group tables: done 
Writing inode tables: done 
Creating journal (8192 blocks): done 
Writing superblocks and filesystem accounting information: done 
$ blkid /dev/nbd0 
/dev/nbd0: UUID="50e0da12-1f99-4d45-b6e6-5f7a7decaeff" 
BLOCK_SIZE="4096" TYPE="ext4" 
$ rbd --cluster cluster1 snap create mirror/image1@fs 
Creating snap: 100% complete...done. 
$ rbd --cluster cluster1 mirror image snapshot mirror/image1 
Snapshot ID: 6 
$ rbd --cluster cluster1 snap ls --all mirror/image1 
SNAPID NAME 
SIZE PROTECTED TIMESTAMP 
NAMESPACE 
5 fs 
1 GiB Thu Jan 21 14:50:24 2021 
user 
6 
.mirror.primary.f9f692b8-2405-416c-9247-5628e303947a.39722e17-f7e6-4050-acf0-3842a5620d81
 
1 GiB Thu Jan 21 14:50:51 2021 mirror (primary 
peer_uuids:[cd643f30-4982-4caf-874d-cf21f6f4b66f]) 

# 
# CLUSTER 2 
# 

$ rbd --cluster cluster2 mirror image status mirror/image1 
image1: 
global_id: f9f692b8-2405-416c-9247-5628e303947a 
state: up+replaying 
description: replaying, 
{"bytes_per_second":1140872.53,"bytes

[ceph-users] Re: Dashboard : Block image listing and infos

2021-01-21 Thread Gilles Mocellin
Hi !

I respond to the list, as it may help others.
I also reorder the response.

> On Mon, Jan 18, 2021 at 2:41 PM Gilles Mocellin <
> 
> gilles.mocel...@nuagelibre.org> wrote:
> > Hello Cephers,
> > 
> > On a new cluster, I only have 2 RBD block images, and the Dashboard
> > doesn't manage to list them correctly.
> > 
> > I have this message :
> >Warning
> >Displaying previously cached data for pool veeam-repos.
> > 
> > Sometime it disappears, but as soon as I reload or return to the listing
> > page, it's there.
> > 
> > What I've seen, is a high CPU load due to ceph-mgr on the active
> > manager.
> > And also stack-traces like this :
[...]
> > dashboard.exceptions.ViewCacheNoDataException: ViewCache: unable to
> > retrieve data
> > 
> > I also see that, when I try to edit an image :
> > 
> > 2021-01-18T11:13:26.383+0100 7f00199ca700  0 [dashboard ERROR
> > frontend.error]
> > (https://fidcl-mrs4-sto-sds.fidcl.cloud:8443/#/block/rbd/edit/veeam-> > 
> > repos%252Fveeam-repo2-vol1
> >  > 252Fveeam-repo2-vol1>): Cannot read property 'features_name' of
> > undefined
> > 
> >   TypeError: Cannot read property 'features_name' of undefined
[...]
> > 
> > But that's perhaps just becaus I open an Edit window on the image and it
> > does not have the datas.
> > The Edit window is empty, and I can't edit things, especially, I wan't
> > to resize the image.
> > 
[...]
> > --
> > Gilles

Le jeudi 21 janvier 2021, 21:56:58 CET Ernesto Puerta a écrit :
> Hey Gilles,
> 
> If I'm not wrong, that exception (ViewCacheNoDataException) happens when
> the dashboard is unable to gather all required data from Ceph within a
> defined timeout (5 secs I think, since the UI refreshes the data every ~5
> seconds).
> 
> It'd be great if you could provide the steps to reproduce it and some
> insights into your environment (number of RBD pools, number of RBD images,
> snapshots, etc.).
> 
> Kind Regards,
> 
> Ernesto

OK, 
As it is now, it always hapens, on the image listing, I have the Warning and 
the list is not always up to date, if I create an image, I must wait very long 
to see it.
Also, I can not edit the 2 big images I have. Perhaps the size is important, 
they are 2 images of 40 TB.
If I create a 1 GB test image, I can edit and resize it.
But impossible withe the big image, the windows opens but all the fields are 
empty.

Also, if it can matter, the images use a data pool (EC 3+2).

I have 2 pools, a replicated one for metadatas veeam-repos (replic x3), and a 
data pool veeam-repos.data (EC 3+2).
My cluster has 6 nodes with AMD 16 cores CPU, 128 GB RAM, 10 8 TB HDD.
So 60 OSD. Soon doubling everything to 12 nodes.

Usage, as the pool and image names can tell, is to mount RBD image as a XFS 
filesystem for a Veeam Backup Repository (krbd, because nbd-rbd tailed 
regularly, especially during fstrim).


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large rbd

2021-01-21 Thread Chris Dunlop

On Thu, Jan 21, 2021 at 10:57:49AM +0100, Robert Sander wrote:

Hi,

Am 21.01.21 um 05:42 schrieb Chris Dunlop:


Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
it just "this is crazy large, if you're trying to go over this you're
doing something wrong, rethink your life..."?


IMHO the limit is there because of the way deletion of RBDs work. "rbd
rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.


I wouldn't have though the ceph designers would have put in a hard limit like
that just to protect people from a long time to delete.

The removal time may well be a consideration for some but it's not a
significant issue in this case as the filesystem is intended to last for years
(the XFS and ZFS it's meant to replace have been around for maybe a decade).

That said, it does take a while. For a 967T rbd (the largest possible w/
default 4M objects) with a small amount written to it (maybe 4T):

$ rbd info rbd.meta/fs
rbd image 'fs':
size 976 TiB in 255852544 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 8126791dce2ad3
data_pool: rbd.ec.data
block_name_prefix: rbd_data.22.8126791dce2ad3
format: 2
features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, data-pool
op_features:
flags:
create_timestamp: Thu Jan 21 14:03:38 2021
access_timestamp: Thu Jan 21 14:03:38 2021
modify_timestamp: Thu Jan 21 14:03:38 2021

$ time rbd remove rbd.meta/fs
real117m31.183s
user116m56.895s
sys 0m2.101s

The issue is the number of objects. For instance, the same size rbd (967T) but
created with "--object-size 16M":

$ rbd info rbd.meta/fs
rbd image 'fs':
size 976 TiB in 63963136 objects
order 24 (16 MiB objects)
...
$ time rbd remove rbd.meta/fs
real7m23.326s
user6m45.201s
sys 0m1.272s

I don't know if the amount written affects the rbd removal time.


Rather than a single large rbd, should I be looking at multiple smaller
rbds linked together using lvm or somesuch? What are the tradeoffs?


IMHO there are no tradeoffs, there could even be benefits creating a
volume group with multiple physical volumes on RBD as the requests can
be bettere parallelized (i.e. virtio-single SCSI controller for qemu).


That's a good point, I hadn't considered potential i/o bandwidth benefits.

Thanks,

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large rbd

2021-01-21 Thread Jason Dillaman
On Thu, Jan 21, 2021 at 6:18 PM Chris Dunlop  wrote:
>
> On Thu, Jan 21, 2021 at 10:57:49AM +0100, Robert Sander wrote:
> > Hi,
> >
> > Am 21.01.21 um 05:42 schrieb Chris Dunlop:
> >
> >> Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
> >> it just "this is crazy large, if you're trying to go over this you're
> >> doing something wrong, rethink your life..."?
> >
> > IMHO the limit is there because of the way deletion of RBDs work. "rbd
> > rm" has to look for every object, not only the ones that were really
> > created. This would make deleting a very very large RBD take a very very
> > long time.
>
> I wouldn't have though the ceph designers would have put in a hard limit like
> that just to protect people from a long time to delete.

You are free to disable the object-map when creating large images by
specifying the image-features -- or you can increase the object size
from its default 4MiB allocation size (which is honestly really no
different from QCOW2 switching increasing the backing cluster size as
the image grows larger).

The issue is that the size for the object-map for a 1PiB image w/ 4MiB
objects is going to be 268,435,456 backing objects which will require
64MiB of memory to store. It also just so happens that Ceph has a
hard-limit on the maximum object size of around 90MiB if I recall
correctly.

> The removal time may well be a consideration for some but it's not a
> significant issue in this case as the filesystem is intended to last for years
> (the XFS and ZFS it's meant to replace have been around for maybe a decade).
>
> That said, it does take a while. For a 967T rbd (the largest possible w/
> default 4M objects) with a small amount written to it (maybe 4T):
>
> $ rbd info rbd.meta/fs
> rbd image 'fs':
>  size 976 TiB in 255852544 objects
>  order 22 (4 MiB objects)
>  snapshot_count: 0
>  id: 8126791dce2ad3
>  data_pool: rbd.ec.data
>  block_name_prefix: rbd_data.22.8126791dce2ad3
>  format: 2
>  features: layering, exclusive-lock, object-map, fast-diff, 
> deep-flatten, data-pool
>  op_features:
>  flags:
>  create_timestamp: Thu Jan 21 14:03:38 2021
>  access_timestamp: Thu Jan 21 14:03:38 2021
>  modify_timestamp: Thu Jan 21 14:03:38 2021
>
> $ time rbd remove rbd.meta/fs
> real117m31.183s
> user116m56.895s
> sys 0m2.101s
>
> The issue is the number of objects. For instance, the same size rbd (967T) but
> created with "--object-size 16M":
>
> $ rbd info rbd.meta/fs
> rbd image 'fs':
>  size 976 TiB in 63963136 objects
>  order 24 (16 MiB objects)
>  ...
> $ time rbd remove rbd.meta/fs
> real7m23.326s
> user6m45.201s
> sys 0m1.272s
>
> I don't know if the amount written affects the rbd removal time.

When the object-map is enabled, only written data extents need to be
deleted. W/o the object-map, it would need to issue deletes against
all possible objects.

> >> Rather than a single large rbd, should I be looking at multiple smaller
> >> rbds linked together using lvm or somesuch? What are the tradeoffs?
> >
> > IMHO there are no tradeoffs, there could even be benefits creating a
> > volume group with multiple physical volumes on RBD as the requests can
> > be bettere parallelized (i.e. virtio-single SCSI controller for qemu).
>
> That's a good point, I hadn't considered potential i/o bandwidth benefits.
>
> Thanks,
>
> Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Large rbd

2021-01-21 Thread Chris Dunlop

On Thu, Jan 21, 2021 at 07:52:00PM -0500, Jason Dillaman wrote:

On Thu, Jan 21, 2021 at 6:18 PM Chris Dunlop  wrote:

On Thu, Jan 21, 2021 at 10:57:49AM +0100, Robert Sander wrote:

Am 21.01.21 um 05:42 schrieb Chris Dunlop:

Is there any particular reason for that MAX_OBJECT_MAP_OBJECT_COUNT, or
it just "this is crazy large, if you're trying to go over this you're
doing something wrong, rethink your life..."?


IMHO the limit is there because of the way deletion of RBDs work. "rbd
rm" has to look for every object, not only the ones that were really
created. This would make deleting a very very large RBD take a very very
long time.


I wouldn't have though the ceph designers would have put in a hard limit like
that just to protect people from a long time to delete.


You are free to disable the object-map when creating large images by
specifying the image-features -- or you can increase the object size
from its default 4MiB allocation size (which is honestly really no
different from QCOW2 switching increasing the backing cluster size as
the image grows larger).

The issue is that the size for the object-map for a 1PiB image w/ 4MiB
objects is going to be 268,435,456 backing objects which will require
64MiB of memory to store. It also just so happens that Ceph has a
hard-limit on the maximum object size of around 90MiB if I recall
correctly.


Is the whole object-map memory all held "in-core" the whole time, or is it 
retrieved / freed as needed (for a busy filesystem that might mean it's 
in-core practically the whole time, but for a quite filesystem maybe not 
so much).


In this case the server has 192G RAM so 64MiB doesn't sound so scary.

By the way - does that 64MiB also match approx how much fast/replicated 
storage I'll need if the data is on an ec volume?


I'm looking at 16M or larger objects, however I'm concerned about 
fragmentation and how that might affect the "thinness" of the volume. It 
seems with larger objects, and in the face of file removals in the upper 
filesystem, with many files less than the object size, there's far more 
chance for many objects to be significantly (but not completely) empty, 
blowing out the actual storage used compared to the logical storage used 
in the upper filesystem. Trimming won't help for partially allocated 
objects.


Actually, maybe that's a good reason to not create a humongous fs in the 
first place.


The XFS devs seem comfortable with growing a fs "a bit", e.g. 2-5 times 
original size, but seemingly at about 10 times it's getting a bit dodgy.


So, creating a smaller fs in the first place (with the expectation it may 
grow to 2-5 times original) means the fs itself will be encouraged to 
reuse the space in larger objects rather than spreading itself out and 
leaving a large number of partially filled objects.



I don't know if the amount written affects the rbd removal time.


When the object-map is enabled, only written data extents need to be
deleted. W/o the object-map, it would need to issue deletes against
all possible objects.


How does the lack of an object-map affect trimming? That's a very 
important factor for a large thin volume like this.


Thanks,

Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io