Hi Eugen,
$ sudo ceph osd tree (output below):
ID CLASS WEIGHT TYPE NAMESTATUS REWEIGHT PRI-AFF
-1 2.05046 root default
-3 0.68349 host node01
0hdd 0.14650 osd.0up 1.0 1.0
4hdd 0.04880 osd.4up 1.0
First problem here is you are using crush-failure-domain=osd when you
should use crush-failure-domain=host. With three hosts, you should use k=2,
m=1; this is not recommended in production environment.
On Mon, Dec 4, 2023, 23:26 duluxoz wrote:
> Hi All,
>
> Looking for some help/explanation aro
Thanks David, I knew I had something wrong :-)
Just for my own edification: Why is k=2, m=1 not recommended for
production? Considered to "fragile", or something else?
Cheers
Dulux-Oz
On 05/12/2023 19:53, David Rivera wrote:
First problem here is you are using crush-failure-domain=osd when
And the second issue is with k4 m2 you'll have min_size = 5 which
means if one host is down your PGs become inactive, which is what you
most likely experienced.
Zitat von David Rivera :
First problem here is you are using crush-failure-domain=osd when you
should use crush-failure-domain=hos
On 12/5/23 10:01, duluxoz wrote:
Thanks David, I knew I had something wrong :-)
Just for my own edification: Why is k=2, m=1 not recommended for
production? Considered to "fragile", or something else?
It is the same as a replicated pool with size=2. Only one host can go
down. After that you
Usually EC requires at least k+1 to be up and active for the pool to be
working. Setting the min value to k risks dataloss.
From: duluxoz
Sent: 05 December 2023 09:01
To: rivera.davi...@gmail.com ;
matt...@peregrineit.net
Cc: ceph-users@ceph.io
Subject: [ceph-
Hi Zitat,
I'm confused - doesn't k4 m2 mean that you can loose any 2 out of the 6
osds?
Cheers
Dulux-Oz
On 05/12/2023 20:02, ceph-users-requ...@ceph.io wrote:
Send ceph-users mailing list submissions to
ceph-users@ceph.io
To subscribe or unsubscribe via email, send a message with s
On 12/5/23 10:06, duluxoz wrote:
I'm confused - doesn't k4 m2 mean that you can loose any 2 out of the 6
osds?
Yes, but OSDs are not a good failure zone.
The host is the smallest failure zone that is practicable and safe
against data loss.
Regards
--
Robert Sander
Heinlein Consulting GmbH
S
Hi Venky,
The recently crashed daemon is likely the MDS which you mentioned in
your subsequent email.
The "recently crashed daemon" was the osd.51 daemon which was in the
metadata pool.
But yes, in the process of trying to get the system running, I probably
did a few steps that were unnece
sort of. It means you can lose 2 and have no data loss. But ceph will do
it's best to protect you from dataloss by offlining the pool until the required
number of chunks is up. See min_size here:
https://docs.ceph.com/en/latest/rados/operations/pools/
From: R
The backfill_toofull OSDs could be the reason why the MDS won't become
active, not sure though, it could also be the unfound object.
I would try to get the third MON online, probably with an empty MON
store. Or do you have any specific error messages why it won't start?
Add the relevant outpu
Hi Robert,
Le 05/12/2023 à 10:05, Robert Sander a écrit :
On 12/5/23 10:01, duluxoz wrote:
Thanks David, I knew I had something wrong :-)
Just for my own edification: Why is k=2, m=1 not recommended for
production? Considered to "fragile", or something else?
It is the same as a replicated p
On Tue, Dec 5, 2023 at 5:16 AM Patrick Begou
wrote:
>
> On my side, I'm working on building my first (small) Ceph cluster using
> E.C. and I was thinking about 5 nodes and k=4 m=2. With a failure domain
> on host and several osd by nodes, in my mind this setup may run degraded
> with 3 nodes using
Hi Matthew,
To make a simplistic comparison, it is generally not recommended to raid 5
with large disks (>1 TB) due to the probability (low but not zero) of
losing another disk during the rebuild.
So imagine losing a host full of disks.
Additionally, min_size=1 means you can no longer maintain yo
Hi,
To return to my comparison with SANs, on a SAN you have spare disks to
repair a failed disk.
On Ceph, you therefore need at least one more host (k+m+1).
If we take into consideration the formalities/delivery times of a new
server, k+m+2 is not luxury (Depending on the growth of your volume).
Ok, so I've misunderstood the meaning of failure domain. If there is no
way to request using 2 osd/node and node as failure domain, with 5 nodes
k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a
raid1 setup. A little bit better than replication in the point of view
of glob
Hi Patrick,
If your hardware is new and you are confident in the support of your
hardware and can consider future expansion, you can possibly start with a
k=3 and m=2.
It is true that we generally prefer to divide (k) the data by an exponent
2, but k=3 does the job
Be careful, it is difficult/pai
Hi Eric,
On Tue, Dec 5, 2023 at 3:43 PM Eric Tittley wrote:
>
> Hi Venky,
>
> > The recently crashed daemon is likely the MDS which you mentioned in
> > your subsequent email.
>
> The "recently crashed daemon" was the osd.51 daemon which was in the
> metadata pool.
>
> But yes, in the process of
On 05/12/2023 12:50, Venky Shankar wrote:
This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email
is genuine and the content is safe.
Hi Eric,
On Tue, Dec 5, 2023 at 3:43 PM Eric Tittley wrote:
Hi Venky,
Any input from anyone?
/Z
On Mon, 4 Dec 2023 at 12:52, Zakhar Kirpichenko wrote:
> Hi,
>
> Just to reiterate, I'm referring to an OSD crash loop because of the
> following error:
>
> "2023-12-03T04:00:36.686+ 7f08520e2700 -1 bdev(0x55f02a28a400
> /var/lib/ceph/osd/ceph-56/block) _aio_thread
Hey Frank, hey Venky,
Thanks for looking into this.
We are not sure yet, if all the expected capacity is or will be released.
Eventually, we just continued further cleaning out old data from the old
pool.
This is still in progress, but with other data sets in this old pool we
indeed observed re
On Tue, Dec 5, 2023 at 6:35 AM Patrick Begou
wrote:
>
> Ok, so I've misunderstood the meaning of failure domain. If there is no
> way to request using 2 osd/node and node as failure domain, with 5 nodes
> k=3+m=1 is not secure enough and I will have to use k=2+m=2, so like a
> raid1 setup. A litt
You can structure your crush map so that you get multiple EC chunks per
host in a way that you can still survive a host outage outage even though
you have fewer hosts than k+1
For example if you run an EC=4+2 profile on 3 hosts you can structure your
crushmap so that you have 2 chunks per host. Thi
On Tue, Dec 5, 2023 at 10:13 AM Zakhar Kirpichenko wrote:
>
> Any input from anyone?
>
> /Z
IIt's not clear whether or not these issues are related. I see three
things in this e-mail chain:
1) bdev() _aio_thread with EPERM, as in the subject of this e-mail chain
2) bdev() _aio_thread with the I/O
Hi,
Recently, I upgraded Ceph from 15.2.16 to 17.2.6, but I found that OSD CPU
usage increased from 30% to 90% or more, and OSD subop_w_latency increased from
600us to 5ms. This is incredible.
My hardware environment:
12 nodes x 12 NVMe (Intel P4510 4T)
I tried to set the OSD configuration t
Thank you, Tyler. Unfortunately (or fortunately?) the drive is fine in this
case: there were no errors reported by the kernel at the time, and I
successfully managed to run a bunch of tests on the drive for many hours
before rebooting the host. The drive has worked without any issues for 3
days now
Hi,
Seems like the sparsify and manual fstrim is doing what it needs to do.
When sparsify the image, if image has snapshots let say 3 snapshots, need to
wait until it rotates all of them (remove and create with new set instead).
I think it reclaims some of it too but I guess it up to free space o
27 matches
Mail list logo