I didn't face any network related issues but in my email start. I have
mentioned about one of the mon node was out of quorum.
Even after restart of mon node had few issues in log like slow ops in mon
node and eventually it cleared but same mon node went out of quorum and
comes back online. During the above said issue observed 40% cpu in wait in
top command in the same node where mon had issue. But there is no errors
logged in dmesg related to drive error or network.  Only after this mon
error may be after 24 hours MDS failed. MON issue cleared now after all
nodes were reboot cycled.

Yeah, disaster recovery will be performed only as last resort.

What would this command do ?

ceph fs reset-mds filesystem --yes-i-really-mean-it


On Thu, Apr 17, 2025 at 5:19 PM Eugen Block <ebl...@nde.ag> wrote:

> Was there any issue in your network or anything? Something that would
> explain the MDS crash? I would recommend to scan syslog, dmesg etc.
> for anything suspicious. If you don't find anything, you might need to
> go through
> https://docs.ceph.com/en/latest/cephfs/disaster-recovery-experts/ to
> get your CephFS back up.
>
> Don't forget to create a backup of the journals. Read the instructions
> carefully, you might not need to do all the mentioned steps. I would
> start with (for both journals where applicable):
>
> cephfs-journal-tool journal export backup.bin
> cephfs-journal-tool event recover_dentries summary
> cephfs-journal-tool [--rank=<fs_name>:{mds-rank|all}] journal reset
> --yes-i-really-really-mean-it
> cephfs-table-tool all reset session
>
> But again, read the instructions carefully. This procedure is usually
> the last resort, I would first try to find out what happened causing
> this situation.
>
> Zitat von Amudhan P <amudha...@gmail.com>:
>
> > I don't think I have a memory issue.
> > Sorry, for sending log as file Pastebin is not working due to size
> > limitation.
> >
> > Crash info
> >
> >> ceph crash info
> >> 2025-04-17T08:50:35.931485Z_0109f173-206c-471e-afac-c0d3e0aa2785
> >> {
> >>     "backtrace": [
> >>         "/lib64/libpthread.so.0(+0x12b20) [0x7fbd42bbdb20]",
> >>         "/usr/lib64/ceph/libceph-common.so.2(+0x8ec7a00)
> [0x7fbd4c81da00]"
> >>     ],
> >>     "ceph_version": "16.2.6",
> >>     "crash_id":
> >> "2025-04-17T08:50:35.931485Z_0109f173-206c-471e-afac-c0d3e0aa2785",
> >>     "entity_name": "mds.mummasstrg.strg-node1.gchapr",
> >>     "os_id": "centos",
> >>     "os_name": "CentOS Linux",
> >>     "os_version": "8",
> >>     "os_version_id": "8",
> >>     "process_name": "ceph-mds",
> >>     "stack_sig":
> >> "5238fe1be0b82b479b10ee7d17b5ad3182cdc93ede581af63d627a472a4fcf9e",
> >>     "timestamp": "2025-04-17T08:50:35.931485Z",
> >>     "utsname_hostname": "strg-node1",
> >>     "utsname_machine": "x86_64",
> >>     "utsname_release": "5.10.0-8-amd64",
> >>     "utsname_sysname": "Linux",
> >>     "utsname_version": "#1 SMP Debian 5.10.46-5 (2021-09-23)"
> >> }
> >>
> >
> > On Thu, Apr 17, 2025 at 2:55 PM Eugen Block <ebl...@nde.ag> wrote:
> >
> >> Oh right, I just noticed you had the status in your first message. Is
> >> there any chance the MDS is out of memory? Can you paste a complete
> >> (fresh) startup log on some pastebin or so? Also, a complete 'ceph
> >> crash info <crash>' could be useful.
> >>
> >> Zitat von Amudhan P <amudha...@gmail.com>:
> >>
> >> > I have run the status and stat command below is the output.
> >> >
> >> > ceph -s
> >> >
> >> > cluster:
> >> >     id:     7b3a4952-2131-11ec-94ce-0cc47a5ec98a
> >> >     health: HEALTH_WARN
> >> >             2 failed cephadm daemon(s)
> >> >             1 filesystem is degraded
> >> >             insufficient standby MDS daemons available
> >> >             7 daemons have recently crashed
> >> >
> >> >   services:
> >> >     mon: 3 daemons, quorum strg-node1,strg-node2,strg-node3 (age 20h)
> >> >     mgr: strg-node2.unyimy(active, since 20h), standbys:
> >> strg-node1.ivkfid
> >> >     mds: 1/1 daemons up
> >> >     osd: 32 osds: 32 up (since 20h), 32 in (since 10w)
> >> >
> >> >   data:
> >> >     volumes: 0/1 healthy, 1 recovering
> >> >     pools:   3 pools, 321 pgs
> >> >     objects: 15.49M objects, 54 TiB
> >> >     usage:   109 TiB used, 66 TiB / 175 TiB avail
> >> >     pgs:     317 active+clean
> >> >              4   active+clean+scrubbing+deep
> >> >
> >> >
> >> > ceph mds stat
> >> > mumstrg:1/1 {0=mumstrg.strg-node1.gchapr=up:replay(laggy or crashed)}
> >> >
> >> > ceph osd lspools
> >> > 1 device_health_metrics
> >> > 2 cephfs.mumstrg.meta
> >> > 3 cephfs.mumstrg.data
> >> >
> >> >
> >> >
> >> > On Thu, Apr 17, 2025 at 10:33 AM Eugen Block <ebl...@nde.ag> wrote:
> >> >
> >> >> What’s your overall Ceph status? It says data pool 3 not found.
> >> >>
> >> >> Zitat von Amudhan P <amudha...@gmail.com>:
> >> >>
> >> >> > There are few more logs in MDS. I have highlighted few log lines
> >> which I
> >> >> am
> >> >> > not sure what it is.
> >> >> >
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -79>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74c6ecd780  5 asok(0x560a2c44e000)
> >> >> > register_command dump inode hook 0x560a2c354580
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -78>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74c6ecd780  5 asok(0x560a2c44e000)
> >> >> > register_command exit hook 0x560a2c354580
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -77>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74c6ecd780  5 asok(0x560a2c44e000)
> >> >> > register_command respawn hook 0x560a2c354580
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -76>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74c6ecd780  5 asok(0x560a2c44e000)
> >> >> > register_command heap hook 0x560a2c354580
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -75>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74b5030700  1
> >> >> mds.mumstrg.strg-node3.xhxbwx
> >> >> > Updating MDS map to version 127517 f
> >> >> > rom mon.2
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -74>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74c6ecd780  5 asok(0x560a2c44e000)
> >> >> > register_command cpu_profiler hook 0x560a2c35458
> >> >> >
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -73>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74b302c700  5
> >> >> > mds.beacon.mumstrg.strg-node3.xhxbwx Sending beacon up:boot seq 1
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -72>
> >> >> > 2025-04-16T14:43:59.170+0000 7f74b302c700 10 monclient:
> >> _send_mon_message
> >> >> > to mon.strg-node3 at v2:10.0.103.3:3300/
> >> >> >
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -71>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700  1
> >> >> mds.mumstrg.strg-node3.xhxbwx
> >> >> > Updating MDS map to version 127518 f
> >> >> > rom mon.2
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -70>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700 10 monclient: _renew_subs
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -69>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700 10 monclient:
> >> _send_mon_message
> >> >> > to mon.strg-node3 at v2:10.0.103.3:3300/
> >> >> >
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -68>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700  4 mds.0.purge_queue
> >> >> operator():
> >> >> >  data pool 3 not found in OSDMap
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -67>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700  5 asok(0x560a2c44e000)
> >> >> > register_command objecter_requests hook 0x560a2c
> >> >> > 3544c0
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -66>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700 10 monclient: _renew_subs
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -65>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700 10 monclient:
> >> _send_mon_message
> >> >> > to mon.strg-node3 at v2:10.0.103.3:3300/
> >> >> >
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -64>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700 10 log_channel(cluster)
> >> >> > update_config to_monitors: true to_syslog: false
> >> >> >  syslog_facility: daemon prio: info to_graylog: false graylog_host:
> >> >> > 127.0.0.1 graylog_port: 12201)
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -63>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700  4 mds.0.purge_queue
> >> >> operator():
> >> >> >  data pool 3 not found in OSDMap
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -62>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700  4 mds.0.0 handle_osd_map
> >> epoch
> >> >> > 0, 0 new blocklist entries
> >> >> > Apr 16 20:13:59 strg-node3 bash[59114]: debug    -61>
> >> >> > 2025-04-16T14:43:59.254+0000 7f74b5030700  1 mds.0.127518
> >> handle_mds_map
> >> >> i
> >> >> > am now mds.0.127518
> >> >> >
> >> >> >> Apr 16 20:13:59 strg-node3 bash[59114]: debug    -60>
> >> >> >> 2025-04-16T14:43:59.254+0000 7f74b5030700  1 mds.0.127518
> >> handle_mds_map
> >> >> >> state change up:boot --> up:replay
> >> >> >> Apr 16 20:13:59 strg-node3 bash[59114]: debug    -59>
> >> >> >> 2025-04-16T14:43:59.254+0000 7f74b5030700  5
> >> >> >> mds.beacon.mummasstrg.strg-node3.xhxbwx set_want_state: up:boot ->
> >> >> up:replay
> >> >> >> Apr 16 20:13:59 strg-node3 bash[59114]: debug    -58>
> >> >> >> 2025-04-16T14:43:59.254+0000 7f74b5030700  1 mds.0.127518
> >> replay_start
> >> >> >> *Apr 16 20:13:59 strg-node3 bash[59114]: debug    -57>
> >> >> >> 2025-04-16T14:43:59.254+0000 7f74b5030700  1 mds.0.127518  waiting
> >> for
> >> >> >> osdmap 45749 (which blocklists prior instance)*
> >> >> >> Apr 16 20:13:59 strg-node3 bash[59114]: debug    -56>
> >> >> >> 2025-04-16T14:43:59.254+0000 7f74b5030700 10 monclient:
> >> >> _send_mon_message
> >> >> >> to mon.strg-node3 at v2:10.0.103.3:3300/0
> >> >> >> *Apr 16 20:13:59 strg-node3 bash[59114]: debug    -55>
> >> >> >> 2025-04-16T14:43:59.254+0000 7f74b5030700  4 mds.0.purge_queue
> >> >> operator():
> >> >> >>  data pool 3 not found in OSDMap*
> >> >> >>
> >> >> >
> >> >> >
> >> >> > On Thu, Apr 17, 2025 at 7:06 AM Amudhan P <amudha...@gmail.com>
> >> wrote:
> >> >> >
> >> >> >> Eugen,
> >> >> >>
> >> >> >> This is the output for the command
> >> >> >> cephfs-journal-tool --rank=mumstrg:all --journal=purge_queue
> journal
> >> >> >> inspect
> >> >> >> Overall journal integrity: OK
> >> >> >> cephfs-journal-tool --rank=mumstrg:all --journal=mdlog journal
> >> inspect
> >> >> >> Overall journal integrity: OK
> >> >> >>
> >> >> >> On Thu, Apr 17, 2025 at 2:59 AM Eugen Block <ebl...@nde.ag>
> wrote:
> >> >> >>
> >> >> >>> I think either your mdlog or the purge_queue journal is
> corrupted:
> >> >> >>>
> >> >> >>> 2025-04-16T09:59:30.146+0000 7f43cf872700  2 mds.0.127506
> Booting:
> >> 2:
> >> >> >>> waiting for purge queue recovered
> >> >> >>> Apr 16 15:29:30 strg-node4 bash[7566]: debug     -1>
> >> >> >>> 2025-04-16T09:59:30.146+0000 7f43d9085700 10 monclient:
> >> >> get_auth_request
> >> >> >>> con 0x562856a25400 auth_method 0
> >> >> >>> Apr 16 15:29:30 strg-node4 bash[7566]: debug      0>
> >> >> >>> 2025-04-16T09:59:30.230+0000 7f43ce06f700 -1 *** Caught signal
> >> >> >>> (Segmentation fault) **
> >> >> >>> Apr 16 15:29:30 strg-node4 bash[7566]:  in thread 7f43ce06f700
> >> >> >>> thread_name:md_log_replay
> >> >> >>>
> >> >> >>> Can you paste the output of this command?
> >> >> >>>
> >> >> >>> cephfs-journal-tool --rank={YOUR_CEPH_FS}:all
> --journal=purge_queue
> >> >> >>> journal inspect
> >> >> >>> cephfs-journal-tool --rank={YOUR_CEPH_FS}:all --journal=mdlog
> >> journal
> >> >> >>> inspect
> >> >> >>>
> >> >> >>> I expect one or more damaged entries. Check this thread for more
> >> >> details:
> >> >> >>>
> >> >> >>> https://www.spinics.net/lists/ceph-users/msg80124.html
> >> >> >>>
> >> >> >>> You should try to backup the journal, but in my case that wasn't
> >> >> >>> possible, so I had no other choice than resetting it.
> >> >> >>>
> >> >> >>> Zitat von Amudhan P <amudha...@gmail.com>:
> >> >> >>>
> >> >> >>> > Hi,
> >> >> >>> >
> >> >> >>> > I am having 2 problems with my Ceph version 16.2.6
> >> >> >>> > (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
> >> deployed
> >> >> >>> thru
> >> >> >>> > cephadm.
> >> >> >>> >
> >> >> >>> > First issue :-
> >> >> >>> > 1 out 3 mon service went out of quorum .
> >> >> >>> > When restarted service it comes normal but after a few minutes
> in
> >> >> ceph
> >> >> >>> > watch log it reports slow ops and mon goes out of quorum.
> >> >> >>> > Node where this mon service failed had one weird thing that I
> >> could
> >> >> see
> >> >> >>> 40%
> >> >> >>> > of wait in the top command. But I don't see any error in dmesg
> or
> >> >> >>> anything
> >> >> >>> > related to drive IO error.
> >> >> >>> > Below are the logs that were printed in ceph watch command.
> >> >> >>> >
> >> >> >>> > 2025-04-16T09:30:00.000393+0530 mon.strg-node2 [WRN] [WRN]
> >> MON_DOWN:
> >> >> 1/3
> >> >> >>> > mons down, quorum strg-node2,strg-node3
> >> >> >>> > 2025-04-16T09:30:00.000416+0530 mon.strg-node2 [WRN]
> >> >>  mon.strg-node1
> >> >> >>> > (rank 0) addr [v2:10.0.103.1:3300/0,v1:10.0.103.1:6789/0] is
> down
> >> >> (out
> >> >> >>> of
> >> >> >>> > quorum)
> >> >> >>> >
> >> >> >>> > For now this is not appearing again.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > Second issue Cephfs degraded  :-
> >> >> >>> > I  have 2 MDS services running in 2 different nodes. Both are
> in a
> >> >> >>> stopped
> >> >> >>> > state.
> >> >> >>> > when running Ceph -s command
> >> >> >>> >
> >> >> >>> >   cluster:
> >> >> >>> >     id:     7b3a4952-2131-11ec-94ce-0cc47a5ec98a
> >> >> >>> >     health: HEALTH_WARN
> >> >> >>> >             2 failed cephadm daemon(s)
> >> >> >>> >             1 filesystem is degraded
> >> >> >>> >             insufficient standby MDS daemons available
> >> >> >>> >
> >> >> >>> >   services:
> >> >> >>> >     mon: 3 daemons, quorum strg-node1,strg-node2,strg-node3
> (age
> >> 4h)
> >> >> >>> >     mgr: strg-node2.unyimy(active, since 4h), standbys:
> >> >> >>> strg-node1.ivkfid
> >> >> >>> >     mds: 1/1 daemons up
> >> >> >>> >     osd: 32 osds: 32 up (since 4h), 32 in (since 10w)
> >> >> >>> >
> >> >> >>> >   data:
> >> >> >>> >     volumes: 0/1 healthy, 1 recovering
> >> >> >>> >     pools:   3 pools, 321 pgs
> >> >> >>> >     objects: 15.49M objects, 54 TiB
> >> >> >>> >     usage:   109 TiB used, 66 TiB / 175 TiB avail
> >> >> >>> >     pgs:     321 active+clean
> >> >> >>> >
> >> >> >>> > Volume shows recovering but there wasn't any progress till now
> >> even
> >> >> >>> manual
> >> >> >>> > start mds service fails again. In Ceph -s command under
> services
> >> it
> >> >> >>> shows
> >> >> >>> > mds up no any mds service is running.
> >> >> >>> >
> >> >> >>> > Below is a log snip from one of the mds service.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> >             -25> 2025-04-16T09:59:29.954+0000 7f43d0874700  1
> >> >> >>> > mds.0.journaler.pq(ro) _finish_read_head loghead(trim
> 13967032320,
> >> >> ex>
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -24>
> >> >> >>> > 2025-04-16T09:59:29.954+0000 7f43d0874700  1
> >> mds.0.journaler.pq(ro)
> >> >> >>> probing
> >> >> >>> > for end of the log
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -23>
> >> >> >>> > 2025-04-16T09:59:29.954+0000 7f43d9085700 10 monclient:
> >> >> get_auth_request
> >> >> >>> > con 0x562856a17400 auth_method 0
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -22>
> >> >> >>> > 2025-04-16T09:59:29.954+0000 7f43d8884700 10 monclient:
> >> >> get_auth_request
> >> >> >>> > con 0x562856a17c00 auth_method 0
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -21>
> >> >> >>> > 2025-04-16T09:59:29.974+0000 7f43cf071700  1
> >> >> mds.0.journaler.mdlog(ro)
> >> >> >>> > recover start
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -20>
> >> >> >>> > 2025-04-16T09:59:29.974+0000 7f43cf071700  1
> >> >> mds.0.journaler.mdlog(ro)
> >> >> >>> > read_head
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -19>
> >> >> >>> > 2025-04-16T09:59:29.974+0000 7f43cf071700  4 mds.0.log Waiting
> for
> >> >> >>> journal
> >> >> >>> > 0x200 to recover...
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -18>
> >> >> >>> > 2025-04-16T09:59:29.974+0000 7f43d8083700 10 monclient:
> >> >> get_auth_request
> >> >> >>> > con 0x562856a25000 auth_method 0
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -17>
> >> >> >>> > 2025-04-16T09:59:29.998+0000 7f43d0874700  1
> >> mds.0.journaler.pq(ro)
> >> >> >>> > _finish_probe_end write_pos = 13968309289 (hea>
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -16>
> >> >> >>> > 2025-04-16T09:59:29.998+0000 7f43d0874700  4 mds.0.purge_queue
> >> >> >>> operator():
> >> >> >>> > open complete
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -15>
> >> >> >>> > 2025-04-16T09:59:29.998+0000 7f43d0874700  1
> >> mds.0.journaler.pq(ro)
> >> >> >>> > set_writeable
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -14>
> >> >> >>> > 2025-04-16T09:59:29.998+0000 7f43cf872700  1
> >> >> mds.0.journaler.mdlog(ro)
> >> >> >>> > _finish_read_head loghead(trim 189741504921>
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -13>
> >> >> >>> > 2025-04-16T09:59:29.998+0000 7f43cf872700  1
> >> >> mds.0.journaler.mdlog(ro)
> >> >> >>> > probing for end of the log
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -12>
> >> >> >>> > 2025-04-16T09:59:30.002+0000 7f43d9085700 10 monclient:
> >> >> get_auth_request
> >> >> >>> > con 0x562856a25c00 auth_method 0
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -11>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43cf872700  1
> >> >> mds.0.journaler.mdlog(ro)
> >> >> >>> > _finish_probe_end write_pos = 1897428915052>
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug    -10>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43cf071700  4 mds.0.log Journal
> >> 0x200
> >> >> >>> > recovered.
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -9>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43cf071700  4 mds.0.log
> Recovered
> >> >> journal
> >> >> >>> > 0x200 in format 1
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -8>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43cf071700  2 mds.0.127506
> >> Booting: 1:
> >> >> >>> > loading/discovering base inodes
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -7>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43cf071700  0 mds.0.cache
> creating
> >> >> system
> >> >> >>> > inode with ino:0x100
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -6>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43cf071700  0 mds.0.cache
> creating
> >> >> system
> >> >> >>> > inode with ino:0x1
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -5>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43d8884700 10 monclient:
> >> >> get_auth_request
> >> >> >>> > con 0x562856a25800 auth_method 0
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -4>
> >> >> >>> > 2025-04-16T09:59:30.098+0000 7f43d8083700 10 monclient:
> >> >> get_auth_request
> >> >> >>> > con 0x562856a5dc00 auth_method 0
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -3>
> >> >> >>> > 2025-04-16T09:59:30.146+0000 7f43cf872700  2 mds.0.127506
> >> Booting: 2:
> >> >> >>> > replaying mds log
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -2>
> >> >> >>> > 2025-04-16T09:59:30.146+0000 7f43cf872700  2 mds.0.127506
> >> Booting: 2:
> >> >> >>> > waiting for purge queue recovered
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug     -1>
> >> >> >>> > 2025-04-16T09:59:30.146+0000 7f43d9085700 10 monclient:
> >> >> get_auth_request
> >> >> >>> > con 0x562856a25400 auth_method 0
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]: debug      0>
> >> >> >>> > 2025-04-16T09:59:30.230+0000 7f43ce06f700 -1 *** Caught signal
> >> >> >>> > (Segmentation fault) **
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]:  in thread 7f43ce06f700
> >> >> >>> > thread_name:md_log_replay
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]:  ceph version 16.2.6
> >> >> >>> > (ee28fb57e47e9f88813e24bbf4c14496ca299d31) pacific (stable)
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]:  1:
> >> >> >>> /lib64/libpthread.so.0(+0x12b20)
> >> >> >>> > [0x7f43dd293b20]
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]:  2:
> >> >> >>> > /usr/lib64/ceph/libceph-common.so.2(+0x8ec7a00)
> [0x7f43e6ef3a00]
> >> >> >>> > Apr 16 15:29:30 strg-node4 bash[7566]:  NOTE: a copy of the
> >> >> executable,
> >> >> >>> or
> >> >> >>> > `objdump -rdS <executable>` is needed to interpret this.
> >> >> >>> >
> >> >> >>> >
> >> >> >>> > Not sure what caused the issue. I couldn't find any resources
> to
> >> fix
> >> >> >>> this
> >> >> >>> > issue.
> >> >> >>> > Need help from someone to bring the ceph cluster online.
> >> >> >>> > _______________________________________________
> >> >> >>> > ceph-users mailing list -- ceph-users@ceph.io
> >> >> >>> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >> >>>
> >> >> >>>
> >> >> >>> _______________________________________________
> >> >> >>> ceph-users mailing list -- ceph-users@ceph.io
> >> >> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >> >>>
> >> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >>
> >>
> >>
> >>
>
>
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to