Hi, we are having an issue at a customer site where a 3PB CephFS is in failed state.
The cluster itself is unhealthy and awaits replacements disks: # ceph -s cluster: id: 28ca2bfa-d87e-11ed-83a3-1070fddda30f health: HEALTH_ERR 4 failed cephadm daemon(s) There are daemons running an older version of ceph 1 filesystem is degraded 1 filesystem is offline 1 mds daemon damaged 8 nearfull osd(s) Low space hindering backfill (add storage if this doesn't resolve itself): 46 pgs backfill_toofull Possible data damage: 4 pgs inconsistent Degraded data redundancy: 6427646/15858772167 objects degraded (0.041%), 8 pgs degraded, 8 pgs undersized 6 pool(s) nearfull (muted: OSDMAP_FLAGS OSD_SCRUB_ERRORS(2d) PG_NOT_DEEP_SCRUBBED PG_NOT_SCRUBBED) services: mon: 3 daemons, quorum sn01,sn03,sn02 (age 3w) mgr: sn03.crlpzh(active, since 33h), standbys: sn01.tegfya, sn02.mzvgcr mds: 18/19 daemons up, 1 standby osd: 181 osds: 174 up (since 4d), 172 in (since 4d); 206 remapped pgs flags nodeep-scrub data: volumes: 2/3 healthy, 1 recovering; 1 damaged pools: 12 pools, 3585 pgs objects: 1.93G objects, 1.3 PiB usage: 2.5 PiB used, 501 TiB / 3.0 PiB avail pgs: 6427646/15858772167 objects degraded (0.041%) 293845758/15858772167 objects misplaced (1.853%) 2844 active+clean 532 active+clean+scrubbing 147 active+remapped+backfill_wait 28 active+remapped+backfill_toofull 11 active+remapped+backfill_wait+backfill_toofull 10 active+remapped+backfilling 6 active+undersized+degraded+remapped+backfill_toofull 2 active+clean+inconsistent 1 active+clean+scrubbing+deep+inconsistent+repair 1 active+undersized+remapped+backfilling 1 active+undersized+degraded+remapped+backfilling 1 active+recovering+degraded+remapped 1 active+remapped+inconsistent+backfill_toofull io: recovery: 183 MiB/s, 312 objects/s The CephFS metadata pool is not affected by the inconsistent PGs. The MDSs have this line in their logfile: "Monitors have assigned me to become a standby." The filesystem is joinable: # ceph fs lsflags storage_cluster joinable allow_snaps allow_multimds_snaps refuse_client_session But no MDS joins: # ceph fs status storage_cluster - 0 clients =============== RANK STATE MDS ACTIVITY DNS INOS DIRS CAPS 0 failed POOL TYPE USED AVAIL cephfs_metadata metadata 490G 12.9T cephfs_data data 970T 54.1T shared_data data 1351T 22.5T STANDBY MDS storage_cluster.sn04.cbvzzu MDS version: ceph version 18.2.4 (e7ad5345525c7aa95470c26863873b581076945d) reef (stable) Why? Regards -- Robert Sander Linux Consultant Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: +49 30 405051 - 0 Fax: +49 30 405051 - 19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io