[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-25 Thread Frédéric Nass
Hello Wesley, I couldn't find any tracker related to this and since min_size=1 has been involved in many critical situations with data loss, I created this one: https://tracker.ceph.com/issues/66641 Regards, Frédéric. - Le 17 Juin 24, à 19:14, Wesley Dillingham w...@wesdillingham.com a écr

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-24 Thread Jan Marquardt
Hi Pablo, > We are willing to work with a Ceph Consultant Specialist, because the data > at stage is very critical, so if you're interested please let me know > off-list, to discuss the details. I totally understand that you want to communicate with potential consultants off-list, but I, and ma

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread Anthony D'Atri
Ohhh, so multiple OSD failure domains on a single SAN node? I suspected as much. I've experienced a Ceph cluster built on SanDisk InfiniFlash, was was somewhere between SAN and DAS arguably. Each of 4 IF chassis drive 4x OSD nodes via SAS, but it was zoned such that the chassis was the failur

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.
In Pablo's unfortunate incident, it was because of a SAN incident, so it's possible that Replica 3 didn't save him. In this scenario, the architecture is more the origin of the incident than the number of replicas. It seems to me that replica 3 exists, by default, since firefly => make replica 2,

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread Wesley Dillingham
Perhaps Ceph itself should also have a warning pop up (in "ceph -s", "ceph health detail" etc) when replica and min_size=1 or in an EC if min_size < k+1. Of course it could be muted but it would give an operator pause initially when setting that. I think a lot of people assume replica size=2 is saf

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread Anthony D'Atri
>> >> * We use replicated pools >> * Replica 2, min replicas 1. Note to self: Change the docs and default to discourage this. This is rarely appropriate in production. You had multiple overlapping drive failures? ___ ceph-users mailing list --

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.
1 pg / 16 is missing, in the meta pool, it is already enough to have great difficulty browsing the FS Your difficulty is to locate important objects in the data pool. Try, perhaps, to target the important objects by retrieving the layout/parent attributes on the objects in the cephfs-replicated po

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread Matthias Grandl
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread cellosof...@gmail.com
Command for trying the export was: [rook@rook-ceph-tools-recovery-77495958d9-plfch ~]$ rados export -p cephfs-replicated /mnt/recovery/backup-rados-cephfs-replicated We made sure we had enough space for this operation, and mounted the /mnt/recovery path using hostPath in the modified rook "toolbo

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread cellosof...@gmail.com
Hi, I understand, We had to re-create the OSDs because of backing storage hardware failure, so recovering from old OSDs is not possible. From your current understanding, is there a possibility to at least recover some of the information, at least the fragments that are not missing. I ask this b

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread Matthias Grandl
___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.
Pablo, Since some PGs are empty and all OSDs are enabled, I'm not optimistic about the future at all. Was the command "ceph osd force-create-pg" executed with missing OSDs ? Le lun. 17 juin 2024 à 17:26, cellosof...@gmail.com a écrit : > Hi everyone, > > Thanks for your kind responses > > I k

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread cellosof...@gmail.com
Hi everyone, Thanks for your kind responses I know the following is not the best scenario, but sadly I didn't have the opportunity of installing this cluster More information about the problem: * We use replicated pools * Replica 2, min replicas 1. * Ceph version 17.2.0 (43e2e60a7559d3f46c9d53f

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread Matthias Grandl
Ah scratch that, my first paragraph about replicated pools is actually incorrect. If it’s a replicated pool and it shows incomplete, it means the most recent copy of the PG is missing. So ideal would be to recover the PG from dead OSDs in any case if possible. Matthias Grandl Head Storage Engin

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread Matthias Grandl
Hi Pablo, It depends. If it’s a replicated setup, it might be as easy as marking dead OSDs as lost to get the PGs to recover. In that case it basically just means that you are below the pools min_size. If it is an EC setup, it might be quite a bit more painful, depending on what happened to t

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-17 Thread David C.
Hi Pablo, Could you tell us a little more about how that happened? Do you have a min_size >= 2 (or E/C equivalent) ? Cordialement, *David CASIER* Le lun. 17 juin 2024 à 16:26, c