[ceph-users] Re: CephFS: no MDS does join the filesystem

Bailey Allison Mon, 30 Jun 2025 06:35:03 -0700

I'm pretty sure the reason is due to the damaged MDS daemon. If you areable to clear that up it should allow the filesystem to come back up. Iseen something like this a few months ago. We were just able to mark themds as "repaired" and haven't seen any issue since, however I woulddiscourage doing that without further investigation into the source ofthe damaged daemon.


Regards,


Bailey Allison
Service Team Lead
45Drives, Ltd.
866-594-7199 x868

On 2025-06-30 10:28, Robert Sander wrote:

Hi,

we are having an issue at a customer site where a 3PB CephFS is infailed state.


The cluster itself is unhealthy and awaits replacements disks:

# ceph -s
  cluster:
    id:     28ca2bfa-d87e-11ed-83a3-1070fddda30f
    health: HEALTH_ERR
            4 failed cephadm daemon(s)
            There are daemons running an older version of ceph
            1 filesystem is degraded
            1 filesystem is offline
            1 mds daemon damaged
            8 nearfull osd(s)

Low space hindering backfill (add storage if this doesn'tresolve itself): 46 pgs backfill_toofull

            Possible data damage: 4 pgs inconsistent

Degraded data redundancy: 6427646/15858772167 objectsdegraded (0.041%), 8 pgs degraded, 8 pgs undersized

            6 pool(s) nearfull

(muted: OSDMAP_FLAGS OSD_SCRUB_ERRORS(2d)PG_NOT_DEEP_SCRUBBED PG_NOT_SCRUBBED)


  services:
    mon: 3 daemons, quorum sn01,sn03,sn02 (age 3w)

mgr: sn03.crlpzh(active, since 33h), standbys: sn01.tegfya,sn02.mzvgcr

    mds: 18/19 daemons up, 1 standby
    osd: 181 osds: 174 up (since 4d), 172 in (since 4d); 206 remapped pgs
         flags nodeep-scrub

  data:
    volumes: 2/3 healthy, 1 recovering; 1 damaged
    pools:   12 pools, 3585 pgs
    objects: 1.93G objects, 1.3 PiB
    usage:   2.5 PiB used, 501 TiB / 3.0 PiB avail
    pgs:     6427646/15858772167 objects degraded (0.041%)
             293845758/15858772167 objects misplaced (1.853%)
             2844 active+clean
             532  active+clean+scrubbing
             147  active+remapped+backfill_wait
             28   active+remapped+backfill_toofull
             11   active+remapped+backfill_wait+backfill_toofull
             10   active+remapped+backfilling
             6 active+undersized+degraded+remapped+backfill_toofull
             2    active+clean+inconsistent
             1    active+clean+scrubbing+deep+inconsistent+repair
             1    active+undersized+remapped+backfilling
             1    active+undersized+degraded+remapped+backfilling
             1    active+recovering+degraded+remapped
             1    active+remapped+inconsistent+backfill_toofull

  io:
    recovery: 183 MiB/s, 312 objects/s


The CephFS metadata pool is not affected by the inconsistent PGs.

The MDSs have this line in their logfile:

"Monitors have assigned me to become a standby."

The filesystem is joinable:

# ceph fs lsflags storage_cluster
joinable allow_snaps allow_multimds_snaps refuse_client_session

But no MDS joins:

# ceph fs status
storage_cluster - 0 clients
===============
RANK  STATE   MDS  ACTIVITY  DNS  INOS  DIRS  CAPS
0    failed
      POOL         TYPE     USED  AVAIL
cephfs_metadata  metadata   490G  12.9T
  cephfs_data      data     970T  54.1T
  shared_data      data    1351T  22.5T
        STANDBY MDS
storage_cluster.sn04.cbvzzu

MDS version: ceph version 18.2.4(e7ad5345525c7aa95470c26863873b581076945d) reef (stable)



Why?


Regards

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS: no MDS does join the filesystem

Reply via email to