With the help of a coworker, we were able to get the orchestrator functional 
again. The issue ended up being a malformed json entry in the node exporter 
fields and in the Grafana cert/key entries. I had put the Grafana certs in back 
in v17, but I never touched any of the exporter certs. I don’t know how all of 
this work as far as suggesting changes, so point me in the right direction and 
I will make the suggestion that the ceph config-key entries be checked and 
loaded individually to give the orchestrator the best chance at starting, or at 
least add some output when debug is on to tell you which keys are actually 
affected vs just a general error such as “self.known_certs[entity] = 
json.loads(v)” knowing the value of v would have save me a lot of time.

From: Laimis Juzeliūnas <laimis.juzeliu...@oxylabs.io>
Sent: Sunday, January 12, 2025 8:51 AM
To: Frank Frampton <frank.framp...@slcschools.org>; zac.do...@proton.me
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Ceph orch commands failing with Error ENOENT: Module 
not found

Hi Frank,

Seems you are hitting the balancer bug in 19.2 common for larger pg numbers 
(the same one mentioned in the tracker). There is a fix making its way through 
final(?) stages of 19.2.1 release.

Unfortunately the only current option is to keep the balancer off and wait for 
19.2.1 to arrive.
We managed our way so far with manual/cron balancing using:
https://github.com/laimis9133/plankton-swarm (our own swissknife)
https://github.com/TheJJ/ceph-balancer
With a some amount of 
https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py


Adding Zac directly here to bring attention once more to the issue:
Users attempting to upgrade to 19.2.0 should be aware of possible balancer 
issues in the documentation here: 
https://docs.ceph.com/en/latest/releases/squid/#v19-2-0-squid


Best,
Laimis J.


On 6 Jan 2025, at 21:58, Frank Frampton 
<frank.framp...@slcschools.org<mailto:frank.framp...@slcschools.org>> wrote:

Recent upgrade from 18.2 to 19.2, upgrade went fine. Since the upgrade and a 
manager fail over, I can no longer run orchestrator commands. The only error I 
can find on an active manager daemon is the following, or it is the only one 
that stands out.

2025-01-06T18:48:41.698+0000 7fcf42b99640 -1 mgr load Failed to construct class 
in 'cephadm'
2025-01-06T18:48:41.698+0000 7fcf42b99640 -1 mgr load Traceback (most recent 
call last):
 File "/usr/share/ceph/mgr/cephadm/module.py", line 667, in __init__
   self.cert_key_store.load()
 File "/usr/share/ceph/mgr/cephadm/inventory.py", line 2073, in load
   self.known_certs[entity] = json.loads(v)
 File "/lib64/python3.9/json/__init__.py", line 346, in loads
   return _default_decoder.decode(s)
 File "/lib64/python3.9/json/decoder.py", line 337, in decode
   obj, end = self.raw_decode(s, idx=_w(s, 0).end())
 File "/lib64/python3.9/json/decoder.py", line 355, in raw_decode
   raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

2025-01-06T18:48:41.702+0000 7fcf42b99640 -1 mgr operator() Failed to run 
module in active mode ('cephadm')

For any thing to really work in the dashboard I must have the balancer off. 
While the balance is off I can make changes in dashboard to the orchestrator, 
and it doesn't give me any trouble. When trying different commands from a ceph 
node say "ceph cephadm config-check status" it returns "Error ENOTSUP: Module 
'cephadm' is not enabled/loaded (required by command 'cephadm config-check 
status'): use `ceph mgr module enable cephadm` to enable it". Running "ceph mgr 
module enable cephadm" returns "module 'cephadm' is already enabled". I really 
don't know where to look or what to try to resolve this. Any "ceph orch" 
command results in "Error ENOENT: Module not found"

I don't know that my issue is related to 
https://www.google.com/url?q=https://tracker.ceph.com/issues/68657&source=gmail-imap&ust=1737301236000000&usg=AOvVaw0nqZZA6yeIwXzuDNuhhkx0,
 but maybe it is.

I have tried the following.
Manually adding a new mgr daemon on different node, it starts runs the 
dashboard fine, but things are still not functional.
Failed the mgr several times.
Disabled/Enabled balancer.
Disabled/Enabled mgr modules.
Disabled/Enabled dashboard.

All physical nodes are running Debian 12.




Frank Frampton
Senior Network Services Administrator
Salt Lake City School District
Desk: (801) 578-8223
Follow the district: 
Facebook<https://www.google.com/url?q=https://www.facebook.com/slcschools&source=gmail-imap&ust=1737301236000000&usg=AOvVaw2u9ap9uRc5kKtF41UFoQ4K>
 | 
Instagram<https://www.google.com/url?q=https://instagram.com/slcschools&source=gmail-imap&ust=1737301236000000&usg=AOvVaw2gmnqAAHLKCah05bohs1Aa>
 | 
Twitter<https://www.google.com/url?q=https://twitter.com/slcschools&source=gmail-imap&ust=1737301236000000&usg=AOvVaw2_9Yi-af1xrk8_QE1PG4TB>
 | 
www.slcschools.org<https://www.google.com/url?q=http://www.slcschools.org/&source=gmail-imap&ust=1737301236000000&usg=AOvVaw3GCyIAzd4YooWiIMbXF6ZG<http://www.slcschools.org%3chttps:/www.google.com/url?q=http://www.slcschools.org/&source=gmail-imap&ust=1737301236000000&usg=AOvVaw3GCyIAzd4YooWiIMbXF6ZG>>
Excellence and Equity: every student, every classroom, every day
Scanned By Microsoft EOP
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
To unsubscribe send an email to 
ceph-users-le...@ceph.io<mailto:ceph-users-le...@ceph.io>

Scanned By Microsoft EOP
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to