[ceph-users] Re: After hardware failure tried to recover ceph and followed instructions for recovery using OSDS

2023-12-05 Thread Manolis Daramas
Hi Eugen, $ sudo ceph osd tree (output below): ID CLASS WEIGHT TYPE NAMESTATUS REWEIGHT PRI-AFF -1 2.05046 root default -3 0.68349 host node01 0hdd 0.14650 osd.0up 1.0 1.0 4hdd 0.04880 osd.4up 1.0

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread David Rivera
First problem here is you are using crush-failure-domain=osd when you should use crush-failure-domain=host. With three hosts, you should use k=2, m=1; this is not recommended in production environment. On Mon, Dec 4, 2023, 23:26 duluxoz wrote: > Hi All, > > Looking for some help/explanation aro

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread duluxoz
Thanks David, I knew I had something wrong  :-) Just for my own edification: Why is k=2, m=1 not recommended for production? Considered to "fragile", or something else? Cheers Dulux-Oz On 05/12/2023 19:53, David Rivera wrote: First problem here is you are using crush-failure-domain=osd when

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Eugen Block
And the second issue is with k4 m2 you'll have min_size = 5 which means if one host is down your PGs become inactive, which is what you most likely experienced. Zitat von David Rivera : First problem here is you are using crush-failure-domain=osd when you should use crush-failure-domain=hos

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Robert Sander
On 12/5/23 10:01, duluxoz wrote: Thanks David, I knew I had something wrong  :-) Just for my own edification: Why is k=2, m=1 not recommended for production? Considered to "fragile", or something else? It is the same as a replicated pool with size=2. Only one host can go down. After that you

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Danny Webb
Usually EC requires at least k+1 to be up and active for the pool to be working. Setting the min value to k risks dataloss. From: duluxoz Sent: 05 December 2023 09:01 To: rivera.davi...@gmail.com ; matt...@peregrineit.net Cc: ceph-users@ceph.io Subject: [ceph-

[ceph-users] Re: ceph-users Digest, Vol 114, Issue 14

2023-12-05 Thread duluxoz
Hi Zitat, I'm confused - doesn't k4 m2 mean that you can loose any 2 out of the 6 osds? Cheers Dulux-Oz On 05/12/2023 20:02, ceph-users-requ...@ceph.io wrote: Send ceph-users mailing list submissions to ceph-users@ceph.io To subscribe or unsubscribe via email, send a message with s

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Robert Sander
On 12/5/23 10:06, duluxoz wrote: I'm confused - doesn't k4 m2 mean that you can loose any 2 out of the 6 osds? Yes, but OSDs are not a good failure zone. The host is the smallest failure zone that is practicable and safe against data loss. Regards -- Robert Sander Heinlein Consulting GmbH S

[ceph-users] Re: MDS stuck in up:rejoin

2023-12-05 Thread Eric Tittley
Hi Venky, The recently crashed daemon is likely the MDS which you mentioned in your subsequent email. The "recently crashed daemon" was the osd.51 daemon which was in the metadata pool. But yes, in the process of trying to get the system running, I probably did a few steps that were unnece

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Danny Webb
sort of. It means you can lose 2 and have no data loss​. But ceph will do it's best to protect you from dataloss by offlining the pool until the required number of chunks is up. See min_size here: https://docs.ceph.com/en/latest/rados/operations/pools/ From: R

[ceph-users] Re: After hardware failure tried to recover ceph and followed instructions for recovery using OSDS

2023-12-05 Thread Eugen Block
The backfill_toofull OSDs could be the reason why the MDS won't become active, not sure though, it could also be the unfound object. I would try to get the third MON online, probably with an empty MON store. Or do you have any specific error messages why it won't start? Add the relevant outpu

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Patrick Begou
Hi Robert, Le 05/12/2023 à 10:05, Robert Sander a écrit : On 12/5/23 10:01, duluxoz wrote: Thanks David, I knew I had something wrong  :-) Just for my own edification: Why is k=2, m=1 not recommended for production? Considered to "fragile", or something else? It is the same as a replicated p

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Rich Freeman
On Tue, Dec 5, 2023 at 5:16 AM Patrick Begou wrote: > > On my side, I'm working on building my first (small) Ceph cluster using > E.C. and I was thinking about 5 nodes and k=4 m=2. With a failure domain > on host and several osd by nodes, in my mind this setup may run degraded > with 3 nodes using

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread David C.
Hi Matthew, To make a simplistic comparison, it is generally not recommended to raid 5 with large disks (>1 TB) due to the probability (low but not zero) of losing another disk during the rebuild. So imagine losing a host full of disks. Additionally, min_size=1 means you can no longer maintain yo

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread David C.
Hi, To return to my comparison with SANs, on a SAN you have spare disks to repair a failed disk. On Ceph, you therefore need at least one more host (k+m+1). If we take into consideration the formalities/delivery times of a new server, k+m+2 is not luxury (Depending on the growth of your volume).

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Patrick Begou
Ok, so I've misunderstood the meaning of failure domain. If there is no way to request using 2 osd/node and node as failure domain, with 5 nodes k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a raid1  setup. A little bit better than replication in the point of view of glob

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread David C.
Hi Patrick, If your hardware is new and you are confident in the support of your hardware and can consider future expansion, you can possibly start with a k=3 and m=2. It is true that we generally prefer to divide (k) the data by an exponent 2, but k=3 does the job Be careful, it is difficult/pai

[ceph-users] Re: MDS stuck in up:rejoin

2023-12-05 Thread Venky Shankar
Hi Eric, On Tue, Dec 5, 2023 at 3:43 PM Eric Tittley wrote: > > Hi Venky, > > > The recently crashed daemon is likely the MDS which you mentioned in > > your subsequent email. > > The "recently crashed daemon" was the osd.51 daemon which was in the > metadata pool. > > But yes, in the process of

[ceph-users] Re: MDS stuck in up:rejoin

2023-12-05 Thread Eric Tittley
On 05/12/2023 12:50, Venky Shankar wrote: This email was sent to you by someone outside the University. You should only click on links or attachments if you are certain that the email is genuine and the content is safe. Hi Eric, On Tue, Dec 5, 2023 at 3:43 PM Eric Tittley wrote: Hi Venky,

[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-05 Thread Zakhar Kirpichenko
Any input from anyone? /Z On Mon, 4 Dec 2023 at 12:52, Zakhar Kirpichenko wrote: > Hi, > > Just to reiterate, I'm referring to an OSD crash loop because of the > following error: > > "2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400 > /var/lib/ceph/osd/ceph-56/block) _aio_thread

[ceph-users] Re: [ext] CephFS pool not releasing space after data deletion

2023-12-05 Thread Kuhring, Mathias
Hey Frank, hey Venky, Thanks for looking into this. We are not sure yet, if all the expected capacity is or will be released. Eventually, we just continued further cleaning out old data from the old pool. This is still in progress, but with other data sets in this old pool we indeed observed re

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Rich Freeman
On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou wrote: > > Ok, so I've misunderstood the meaning of failure domain. If there is no > way to request using 2 osd/node and node as failure domain, with 5 nodes > k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a > raid1 setup. A litt

[ceph-users] Re: EC Profiles & DR

2023-12-05 Thread Christian Wuerdig
You can structure your crush map so that you get multiple EC chunks per host in a way that you can still survive a host outage outage even though you have fewer hosts than k+1 For example if you run an EC=4+2 profile on 3 hosts you can structure your crushmap so that you have 2 chunks per host. Thi

[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-05 Thread Tyler Stachecki
On Tue, Dec 5, 2023 at 10:13 AM Zakhar Kirpichenko wrote: > > Any input from anyone? > > /Z IIt's not clear whether or not these issues are related. I see three things in this e-mail chain: 1) bdev() _aio_thread with EPERM, as in the subject of this e-mail chain 2) bdev() _aio_thread with the I/O

[ceph-users] OSD CPU and write latency increase after upgrade from 15.2.16 to 17.2.6

2023-12-05 Thread Tony Yao
Hi, Recently, I upgraded Ceph from 15.2.16 to 17.2.6, but I found that OSD CPU usage increased from 30% to 90% or more, and OSD subop_w_latency increased from 600us to 5ms. This is incredible. My hardware environment: 12 nodes x 12 NVMe (Intel P4510 4T) I tried to set the OSD configuration t

[ceph-users] Re: Ceph 16.2.14: osd crash, bdev() _aio_thread got r=-1 ((1) Operation not permitted)

2023-12-05 Thread Zakhar Kirpichenko
Thank you, Tyler. Unfortunately (or fortunately?) the drive is fine in this case: there were no errors reported by the kernel at the time, and I successfully managed to run a bunch of tests on the drive for many hours before rebooting the host. The drive has worked without any issues for 3 days now

[ceph-users] Re: Space reclaim doesn't happening in nautilus RBD pool

2023-12-05 Thread Szabo, Istvan (Agoda)
Hi, Seems like the sparsify and manual fstrim is doing what it needs to do. When sparsify the image, if image has snapshots let say 3 snapshots, need to wait until it rotates all of them (remove and create with new set instead). I think it reclaims some of it too but I guess it up to free space o