You could try manually deleting the files from the directory fragments, using `rados` commands. Make sure to flush your MDS journal first and take the fs offline (`ceph fs fail`).
On Tue, Jun 4, 2024 at 8:50 AM Stolte, Felix <f.sto...@fz-juelich.de> wrote: > > Hi Patrick, > > it has been a year now and we did not have a single crash since upgrading to > 16.2.13. We still have the 19 corrupted files which are reported by 'damage > ls‘. Is it now possible to delete the corrupted files without taking the > filesystem offline? > > Am 22.05.2023 um 20:23 schrieb Patrick Donnelly <pdonn...@redhat.com>: > > Hi Felix, > > On Sat, May 13, 2023 at 9:18 AM Stolte, Felix <f.sto...@fz-juelich.de> wrote: > > Hi Patrick, > > we have been running one daily snapshot since december and our cephfs crashed > 3 times because of this https://tracker.ceph.com/issues/38452 > > We currentliy have 19 files with corrupt metadata found by your > first-damage.py script. We isolated the these files from access by users and > are waiting for a fix before we remove them with your script (or maybe a new > way?) > > No other fix is anticipated at this time. Probably one will be > developed after the cause is understood. > > Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the > mds servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 > damage ls‘ is showing me the same files as your script (initially only a > part, after a cephfs scrub all of them). > > This is expected. Once the dentries are marked damaged, the MDS won't > allow operations on those files (like those triggering tracker > #38452). > > I noticed "mds: catch damage to CDentry’s first member before persisting > (issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13 > and like to ask you the following questions: > > a) can we repair the damaged files online now instead of bringing down the > whole fs and using the python script? > > Not yet. > > b) should we set one of the new mds options in our specific case to avoid our > fileserver crashing because of the wrong snap ids? > > Have your MDS crashed or just marked the dentries damaged? If you can > reproduce a crash with detailed logs (debug_mds=20), that would be > incredibly helpful. > > c) will your patch prevent wrong snap ids in the future? > > It will prevent persisting the damage. > > > -- > Patrick Donnelly, Ph.D. > He / Him / His > Red Hat Partner Engineer > IBM, Inc. > GPG: 19F28A586F808C2402351B93C3301A3E258DD79D > > > mit freundlichem Gruß > Felix Stolte > > IT-Services > mailto: f.sto...@fz-juelich.de > Tel: 02461-619243 > > --------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------- > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Stefan Müller > Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), > Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens > --------------------------------------------------------------------------------------------- > --------------------------------------------------------------------------------------------- > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Patrick Donnelly, Ph.D. He / Him / His Red Hat Partner Engineer IBM, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io