[ceph-users] Re: Module 'devicehealth' has failed

2025-03-11 Thread Eugen Block
Hm, is it really necessary to configure all the device paths manually? I'd recommend to use rotational flags to distinguish between OSD and DB/WAL devices. Can you give it a try with a simpler spec file? Something like: service_type: osd service_id: node1.ec.all_disks service_name: osd.node

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-07 Thread Eugen Block
I don't have a good idea right now, I literally took the same spec file as yours and it works fine for me in a tiny lab cluster. Maybe someone else has a good idea. Zitat von Alex from North : I did. It says more or less the same Mar 06 10:44:05 node1.ec.mts conmon[10588]: 2025-03-06T10

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-06 Thread Alex from North
Thanks for the help, buddy! I really appreciate it! Will try to wait. Maybe someone else jumps in. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-06 Thread Alex from North
I did. It says more or less the same Mar 06 10:44:05 node1.ec.mts conmon[10588]: 2025-03-06T10:44:05.769+ 7faca5624640 -1 log_channel(cephadm) log [ERR] : Failed to apply osd.node1.ec.mts_all_disks spec DriveGroupSpec.from_json(yaml.safe_load('''service_type: osd Mar 06 10:44:05 node1.ec.m

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-06 Thread Alex from North
a bit more details. Now I've notices that ceph health detail signals to me that [WRN] CEPHADM_APPLY_SPEC_FAIL: Failed to apply 1 service(s): osd.node1.ec.all_disks osd.node1.ec.all_disks: Expecting value: line 1 column 2311 (char 2310) Okay, I checked my spec but do not see anything suspicious

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-06 Thread Alex from North
I will provide you any info you need, just gimme a sign. My starter post was related to 19.2.0. Now I downgraded (full reinstall as this is completely new cluster I wanna run) to 18.2.4 and the same story Mar 06 09:37:41 node1.ec.mts conmon[10588]: failed to collect metrics: Mar 06 09:37:41 nod

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-05 Thread Eugen Block
And do you also have the device_health_metrics pool? During one of the upgrades to Quincy or so the older device_health_metrics should have been renamed. But on one customer cluster I found that both were still there, although that didn't cause any trouble. I don't really fully grasp yet wh

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-04 Thread Alex from North
yes, I do .mgr 10 1 769 KiB2 2.3 MiB 04.7 PiB ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Module 'devicehealth' has failed

2025-03-04 Thread Eugen Block
Do you have a pool named ".mgr"? Zitat von Alex from North : Hello everybody! Running 19.2.0 faced an issued still cannot struggle. And this is Module 'devicehealth' has failed: Expecting value: line 1 column 2378 (char 2377) in MGR log I see Mar 04 12:48:07 node2.ec.mts ceph-mgr[3821449]

[ceph-users] Re: Module 'devicehealth' has failed

2021-09-05 Thread Davíð Steinn Geirsson
Hi, On Sun, Sep 05, 2021 at 01:25:32PM +0800, David Yang wrote: > hi, buddy > > I have a ceph file system cluster, using ceph version 15.2.14. > > But the current status of the cluster is HEALTH_ERR. > > health: HEALTH_ERR > Module 'devicehealth' has failed: I had this error af

[ceph-users] Re: Module 'devicehealth' has failed:

2021-06-15 Thread Torkil Svensgaard
Hi Thanks, I guess this might have something to do with it: " Jun 15 09:44:22 dcn-ceph-01 bash[3278]: debug 2021-06-15T09:44:22.507+ 7f704e4b3700 -1 mgr notify devicehealth.notify: Jun 15 09:44:22 dcn-ceph-01 bash[3278]: debug 2021-06-15T09:44:22.507+ 7f704e4b3700 -1 mgr notify Traceba

[ceph-users] Re: Module 'devicehealth' has failed:

2021-06-15 Thread Sebastian Wagner
Hi Torkil, you should see more information in the MGR log file. Might be an idea to restart the MGR to get some recent logs. Am 15.06.21 um 09:41 schrieb Torkil Svensgaard: Hi Looking at this error in v15.2.13: " [ERR] MGR_MODULE_ERROR: Module 'devicehealth' has failed:     Module 'devicehea