Hello Wesley,
I couldn't find any tracker related to this and since min_size=1 has been
involved in many critical situations with data loss, I created this one:
https://tracker.ceph.com/issues/66641
Regards,
Frédéric.
- Le 17 Juin 24, à 19:14, Wesley Dillingham w...@wesdillingham.com a écr
Hi Pablo,
> We are willing to work with a Ceph Consultant Specialist, because the data
> at stage is very critical, so if you're interested please let me know
> off-list, to discuss the details.
I totally understand that you want to communicate with potential consultants
off-list,
but I, and ma
Ohhh, so multiple OSD failure domains on a single SAN node? I suspected as
much.
I've experienced a Ceph cluster built on SanDisk InfiniFlash, was was somewhere
between SAN and DAS arguably. Each of 4 IF chassis drive 4x OSD nodes via SAS,
but it was zoned such that the chassis was the failur
In Pablo's unfortunate incident, it was because of a SAN incident, so it's
possible that Replica 3 didn't save him.
In this scenario, the architecture is more the origin of the incident than
the number of replicas.
It seems to me that replica 3 exists, by default, since firefly => make
replica 2,
Perhaps Ceph itself should also have a warning pop up (in "ceph -s", "ceph
health detail" etc) when replica and min_size=1 or in an EC if min_size <
k+1. Of course it could be muted but it would give an operator pause
initially when setting that. I think a lot of people assume replica size=2
is saf
>>
>> * We use replicated pools
>> * Replica 2, min replicas 1.
Note to self: Change the docs and default to discourage this. This is rarely
appropriate in production.
You had multiple overlapping drive failures?
___
ceph-users mailing list --
1 pg / 16 is missing, in the meta pool, it is already enough to have great
difficulty browsing the FS
Your difficulty is to locate important objects in the data pool.
Try, perhaps, to target the important objects by retrieving the
layout/parent attributes on the objects in the cephfs-replicated po
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Command for trying the export was:
[rook@rook-ceph-tools-recovery-77495958d9-plfch ~]$ rados export -p
cephfs-replicated /mnt/recovery/backup-rados-cephfs-replicated
We made sure we had enough space for this operation, and mounted the
/mnt/recovery path using hostPath in the modified rook "toolbo
Hi,
I understand,
We had to re-create the OSDs because of backing storage hardware failure,
so recovering from old OSDs is not possible.
From your current understanding, is there a possibility to at least recover
some of the information, at least the fragments that are not missing.
I ask this b
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Pablo,
Since some PGs are empty and all OSDs are enabled, I'm not optimistic about
the future at all.
Was the command "ceph osd force-create-pg" executed with missing OSDs ?
Le lun. 17 juin 2024 à 17:26, cellosof...@gmail.com
a écrit :
> Hi everyone,
>
> Thanks for your kind responses
>
> I k
Hi everyone,
Thanks for your kind responses
I know the following is not the best scenario, but sadly I didn't have the
opportunity of installing this cluster
More information about the problem:
* We use replicated pools
* Replica 2, min replicas 1.
* Ceph version 17.2.0 (43e2e60a7559d3f46c9d53f
Ah scratch that, my first paragraph about replicated pools is actually
incorrect. If it’s a replicated pool and it shows incomplete, it means the most
recent copy of the PG is missing. So ideal would be to recover the PG from dead
OSDs in any case if possible.
Matthias Grandl
Head Storage Engin
Hi Pablo,
It depends. If it’s a replicated setup, it might be as easy as marking dead
OSDs as lost to get the PGs to recover. In that case it basically just means
that you are below the pools min_size.
If it is an EC setup, it might be quite a bit more painful, depending on what
happened to t
Hi Pablo,
Could you tell us a little more about how that happened?
Do you have a min_size >= 2 (or E/C equivalent) ?
Cordialement,
*David CASIER*
Le lun. 17 juin 2024 à 16:26, c
16 matches
Mail list logo