[ceph-users] Re: no recovery running
Hey Joffrey, try to switch back to the wpq scheduler in ceph.conf: osd_op_queue = wpq ...and restart all OSDs. I also had issues where recovery was very very slow (10kb/s). Best Regards, Alex Walender Am 17.10.24 um 11:44 schrieb Joffrey: HI, This is my cluster: cluster: id: c300532c-51fa-11ec-9a41-0050569c3b55 health: HEALTH_WARN Degraded data redundancy: 2062374/1331064781 objects degraded (0.155%), 278 pgs degraded, 40 pgs undersized 2497 pgs not deep-scrubbed in time 2497 pgs not scrubbed in time services: mon: 3 daemons, quorum hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3 (age 9d) mgr: hbgt-ceph1-mon3.gmfzqm(active, since 10d), standbys: hbgt-ceph1-mon2.nteihj, hbgt-ceph1-mon1.thrnnu osd: 96 osds: 96 up (since 9d), 96 in (since 45h); 1588 remapped pgs rgw: 3 daemons active (3 hosts, 2 zones) data: pools: 16 pools, 2497 pgs objects: 266.22M objects, 518 TiB usage: 976 TiB used, 808 TiB / 1.7 PiB avail pgs: 2062374/1331064781 objects degraded (0.155%) 349917519/1331064781 objects misplaced (26.289%) 1312 active+remapped+backfill_wait 864 active+clean 199 active+recovery_wait+degraded+remapped 38 active+recovery_wait+degraded 33 active+undersized+degraded+remapped+backfill_wait 33 active+recovery_wait+remapped 7active+recovery_wait 6active+undersized+degraded+remapped+backfilling 2active+recovering+remapped 1active+remapped+backfilling 1active+recovering+degraded+remapped 1active+recovery_wait+undersized+degraded+remapped io: client: 683 KiB/s rd, 2.2 KiB/s wr, 51 op/s rd, 2 op/s wr No recovery is running and I don't understand why. I have free space: ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL%USE VAR PGS STATUS TYPE NAME -1 1784.12231 - 1.7 PiB 976 TiB 895 TiB 298 GiB 4.1 TiB 808 TiB 54.72 1.00- root default -5 208.09680 - 208 TiB 142 TiB 130 TiB 51 GiB 605 GiB 66 TiB 68.14 1.25- host hbgt-ceph1-osd01 1hdd17.34140 1.0 17 TiB11 TiB 11 TiB 33 KiB49 GiB 5.9 TiB 66.16 1.21 136 up osd.1 3hdd17.34140 1.0 17 TiB11 TiB 10 TiB 23 GiB49 GiB 6.3 TiB 63.80 1.17 139 up osd.3 5hdd17.34140 1.0 17 TiB13 TiB 12 TiB 139 MiB53 GiB 4.8 TiB 72.31 1.32 142 up osd.5 7hdd17.34140 1.0 17 TiB12 TiB 11 TiB 11 GiB51 GiB 5.6 TiB 67.97 1.24 145 up osd.7 9hdd17.34140 1.0 17 TiB11 TiB 10 TiB 2.2 GiB49 GiB 6.0 TiB 65.67 1.20 140 up osd.9 11hdd17.34140 1.0 17 TiB12 TiB 11 TiB 329 MiB50 GiB 5.5 TiB 68.42 1.25 145 up osd.11 13hdd17.34140 1.0 17 TiB12 TiB 11 TiB 1.5 GiB52 GiB 5.1 TiB 70.45 1.29 153 up osd.13 15hdd17.34140 1.0 17 TiB12 TiB 11 TiB 61 KiB48 GiB 5.7 TiB 66.85 1.22 144 up osd.15 17hdd17.34140 1.0 17 TiB11 TiB 9.5 TiB 272 MiB45 GiB 6.8 TiB 60.63 1.11 120 up osd.17 19hdd17.34140 1.0 17 TiB11 TiB 10 TiB 12 GiB50 GiB 5.9 TiB 65.90 1.20 134 up osd.19 21hdd17.34140 1.0 17 TiB13 TiB 12 TiB 1.6 GiB57 GiB 4.1 TiB 76.49 1.40 152 up osd.21 23hdd17.34140 1.0 17 TiB13 TiB 12 TiB 31 KiB54 GiB 4.7 TiB 73.10 1.34 124 up osd.23 -3 208.09680 - 208 TiB 146 TiB 134 TiB 64 GiB 629 GiB 62 TiB 70.05 1.28- host hbgt-ceph1-osd02 0hdd17.34140 1.0 17 TiB11 TiB 9.8 TiB 22 GiB49 GiB 6.6 TiB 62.07 1.13 124 up osd.0 2hdd17.34140 1.0 17 TiB12 TiB 11 TiB 1.7 GiB52 GiB 5.2 TiB 70.14 1.28 150 up osd.2 4hdd17.34140 1.0 17 TiB12 TiB 11 TiB 1.8 GiB48 GiB 5.8 TiB 66.83 1.22 152 up osd.4 6hdd17.34140 0.85004 17 TiB13 TiB 12 TiB 11 GiB58 GiB 4.0 TiB 76.85 1.40 153 up osd.6 8hdd17.34140 1.0 17 TiB12 TiB 11 TiB 11 GiB54 GiB 4.9 TiB 71.58 1.31 152 up osd.8 10hdd17.34140 1.0 17 TiB11 TiB 10 TiB 6.3 MiB47 GiB 6.1 TiB 64.91 1.19 133 up osd.10 12hdd17.34140 1.0 17 TiB12 TiB 11 TiB 109 MiB51 GiB 5.6 TiB 67.72 1.24 137 up osd.12 14hdd17.34140
[ceph-users] Re: Destroyed OSD clinging to wrong disk
Dave, If there's one bitter lesson I learned from IBM's OS/2 OS it was that one should never store critical information in two different repositories. There Should Be Only One, and you may replicate it, but at the end of the day, if you don't have a single point of Authority, you'll suffer. Regrettably, Ceph has issues there. Very frequently data displayed in the Dashboard does not match data from the Ceph command line. Which to me indicates that the information isn't always coming from the same place. To be clear, I'm not talking about the old /etc/ceph stuff versus the more modern configuration database, I'm talking about cases where apparently sometimes info comes from components (such as direct from an OSD) and sometimes from somewhere else and they're not staying in sync. I feel your pain. For certain versions of Ceph, it is possible to have the same OSD defined both as administered and legacy. The administered stuff tends to have dynamically-defined systemd units, which means you can't simply delete the offending service file. Or even find it, unless you know where such things live. Go back through this list's history to about June and you'll see a lot of wailing from me about that sort of thing and the "phantomm host" issue, where a non-ceph host managed to insinuate itself into the mix and took forever to expunge. I'm very grateful to Eugen for the help there. It's possible you might find some insights if you wade through it. To the best of my knowledge everything relating to an OSD resides in one of three places: 1. The /etc/ceph directory (mostly deprecated except for maybe keyring). And of course, the FSID! 2. The Ceph configuration repository (possibly keyring, not sure if much else). 3, The Ceph OSD directory under /var/lib/ceph. Whether legacy or administered, the exact path may differ, but the overall layout is the same. One directory per OSD. Everything important relating to the OSD is there, or at least linked from there. You haven't fully purged a defective OSD until it no longer has a presence in either the "ceph osd tree" command, the "ceph orch ps" command or in the OSD host's systemctl list as an "osd" service. Which is easier said than done, but setting the unwanted OSD's weights to 0 is a major help. In one particular case where I had a doubly-defined OSD, I think I ultimately cured it by turning off the OSD, deleting an OSD service file for the legacy OSD definition from /etc/systemd/system, then drawing a deep breath and doing a "rm -rf /var/lib/ceph/osd,xx", leaving the /var/lib/ceph//osd,xxx alone. Followed by an OSD restart. But do check my previously-mentioned messages to make sure there aren't some "gotchas" that I forgot. If you have issues with the raw data store under the OSD, then it would take someone wiser and braver than me to repair it without first deleting all OSD definitions that reference it, zapping the raw data store to remove all ceph admin and LVM info that might offend ceph, then re-defining the OSD on the cleaned data store. While Ceph can be a bit crotchety, I'll give it credit for one thing. Even broken it's robust enough I've never lost or corrupted the actual data, despite the fact that I've done an uncomfortable amount of stuff where I'm just randomly banging on things with a hammer. I still do backups, though. :) Now if I could just persuade the auto-tuner to actually adjust the pg sizes the way I told it to. Tim On Tue, 2024-10-29 at 22:37 -0400, Dave Hall wrote: > Tim, > > Thank you for your guidance. Your points are completely understood. > It > was more that I couldn't figure out why the Dashboard was telling me > that > the destroyed OSD was still using /dev/sdi when the physical disk > with that > serial number was at /dev/sdc, and when another OSD was also > reporting > /dev/sdi. I figured that there must be some information buried > somewhere. > I don't know where this metadata comes from or how it gets updated > when > things like 'drive letters' change, but the metadata matched what the > dashboard showed, so now I know something new. > > Regarding the process for bringing the OSD back online with a new > HDD, I am > still having some difficulties. I used the steps in the > Adding/Removing > OSDs document under Removing the OSD, and the OSD mostly appears to > be > gone. However, attempts to use 'ceph-volume lvm prepare' to build > the > remplacement OSD are failing, Same thing with 'ceph orch daemon add > osd'. > > I think the problem might be that the NVMe LV that was the WAL/DB for > the > failed OSD did not get cleaned up, but on my systems 4 OSDs use the > same > NVMe drive for WAL/DB, so I'm not sure how to proceed. > > Any suggestions would be welcome. > > Thanks. > > -Dave > > -- > Dave Hall > Binghamton University > kdh...@binghamton.edu > > > On Tue, Oct 29, 2024 at 3:13 PM Tim Holloway > wrote: > > > Take care when reading the output of "ceph osd metadata". When you > > are > > running the OS
[ceph-users] Re: Deploy custom mgr module
On Wednesday, October 30, 2024 2:00:56 PM EDT Darrell Enns wrote: > Is there a simple way to deploy a custom (in-house) mgr module to an > orchestrator managed cluster? I assume the module code would need to be > included in the mgr container image. However, there doesn't seem to be a > straightforward way to do this without having the module merged to upstream > ceph (not possible for a custom/in-house solution) or maintaining an > in-house container repository and custom container images (a lot of extra > maintenance overhead). > > Also, what's the best way to handle testing during development? Custom > scripts to push the code into to a mgr container in a dev cluster? There's a tool in the ceph tree designed for this: src/script/cpatch.py (there's also an older shell based version in the same dir but that is not maintained WRT python changes as far as I know). There are some downsides to how this script works, as it creates many layers, but it's intended for development and in these cases having extra container layers is not usually a big deal. If you are working on a python based mgr module there's flags you can pass to have the script only create images with your local version of src/pybind/mgr (or whatnot, see the --help for more info). If you need extra help with the script, just ask here on the list. I use it frequently and have been maintaining it. When you build an image you can then push it to a private or public registry and configure ceph(adm) to use it. ( See https://docs.ceph.com/en/latest/ cephadm/install/#deployment-in-an-isolated-environment for some hints) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Deploy custom mgr module
Build your own Image based on the ceph container Image. Joachim Kraftmayer CEO joachim.kraftma...@clyso.com www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306 Darrell Enns schrieb am Mi., 30. Okt. 2024, 19:01: > Is there a simple way to deploy a custom (in-house) mgr module to an > orchestrator managed cluster? I assume the module code would need to be > included in the mgr container image. However, there doesn't seem to be a > straightforward way to do this without having the module merged to upstream > ceph (not possible for a custom/in-house solution) or maintaining an > in-house container repository and custom container images (a lot of extra > maintenance overhead). > > Also, what's the best way to handle testing during development? Custom > scripts to push the code into to a mgr container in a dev cluster? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Deploy custom mgr module
Speaking abstractly, I can see 3 possible approaches. 1. You can create a separate container and invoke it from the mgr container as a micro-service. As to how, I don't know. This is likely the cleanest approach. 2. You can create a Dockerfile based on the stock mgr but with your extensions added. The main problem with this is that from what I can see, the cephadm tool has the names and repositories of the stock containers hard-wired in. Which ensures quality (getting the right versions) and integrity (makes it hard for a bad agent to swap in a malware module). So more information is needed at least. 3. You can inject your code into the stock container image by packaging it as an RPM, adding a local RPM repository to the stock container, and installing the extra code something like "docker exec -it [cephadmn- container-name] /usr/bin/dnf install mycode". The third option does require that the infrastructure to run dnf/yum hasn't been removed from the container image. Also not that if you're running a dynamic container launch, you might have to deal with having to re-install your code every time the container launches because there would be no persistent image.. However, option 3 would, if the stars are right, be something that Ansible could easily handle. As for testing, I'd look at the source for the mgr module and its regression tests. Plus of course testing your own code is something you'd have to do yourself. Tom On Wed, 2024-10-30 at 18:00 +, Darrell Enns wrote: > Is there a simple way to deploy a custom (in-house) mgr module to an > orchestrator managed cluster? I assume the module code would need to > be included in the mgr container image. However, there doesn't seem > to be a straightforward way to do this without having the module > merged to upstream ceph (not possible for a custom/in-house solution) > or maintaining an in-house container repository and custom container > images (a lot of extra maintenance overhead). > > Also, what's the best way to handle testing during development? > Custom scripts to push the code into to a mgr container in a dev > cluster? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: "ceph orch" not working anymore
Hello Eugen, thanks a lot. We got our down time today to work on the cluster. However, nothing worked. Even with Ceph 19. All ceph orch commands do not work. Error ENOENT: No orchestrator configured (try `ceph orch set backend`) This has nothing to do with osd_remove_queue. Getting back the MON quorum with three MONs and also three MGRs with Squid did not help at all. I still think this can be fixed somehow. Perhaps with editing the mon store somehow but I don't know where. We decided to deploy a new cluster since backups are available. Thanks again everybody. Best, Malte On 18.10.24 16:37, Eugen Block wrote: Hi Malte, so I would only suggest to bring up a new MGR, issue a failover to that MGR and see if you get the orchestrator to work again. It should suffice to change the container_image in the unit.run file (/ var/lib/ceph/{FSID}/mgr.{MGR}/unit.run): CONTAINER_IMAGE={NEWER IMAGE} So stop one MGR, change the container image, start it and make sure it takes over as the active MGR. But I would like to know if I could replace the cephadm on one running node, stop the MGR and deploy a new MGR on that node with this: https://docs.ceph.com/en/latest/cephadm/troubleshooting/#manually- deploying-a-manager-daemon cephadm --image deploy --fsid --name mgr.hostname.smfvfd --config-json config-json.json This approach probably works as well, but I haven't tried that yet. And I still do not know what places cephadm... under /var/lib/ceph/fsid. Does that happen when I enable the orchestrator in the MGR? And can I replace that cephadm by hand? The orchestrator would automatically download the respective cephadm image into that directory if you changed the container_image config value(s). But I wouldn't do that because you could break your cluster. If for some reason a MON, OSD or some other Ceph daemon would need to be redeployed, you would basically upgrade it. That's why I would suggest to only start one single MGR daemon with a newer version to see how it goes. In case you get the orchestrator to work again, I would "downgrade" it again and see what happens next. Zitat von Eugen Block : I’m on a mobile phone right now, I can’t go into much detail right now. But I don’t think it’s necessary to rebuild an entire node, just a mgr. otherwise you risk cluster integrity if you redeploy a mon as well with a newer image. I’ll respond later in more detail. Zitat von Malte Stroem : Well, thank you, Eugen. That is what I planned to do. Rebuild the broken node and start a MON and a MGR there with the latest images. Then I will stop the other MGRs and have a look if it's working. But I would like to know if I could replace the cephadm on one running node, stop the MGR and deploy a new MGR on that node with this: https://docs.ceph.com/en/latest/cephadm/troubleshooting/#manually- deploying-a-manager-daemon cephadm --image deploy --fsid --name mgr.hostname.smfvfd --config-json config-json.json And I still do not know what places cephadm... under /var/lib/ceph/fsid. Does that happen when I enable the orchestrator in the MGR? And can I replace that cephadm by hand? Best, Malte On 18.10.24 12:11, Eugen Block wrote: Okay, then I misinterpreted your former statement: I think there are entries of the OSDs from the broken node we removed. So the stack trace in the log points to the osd_remove_queue, but I don't understand why it's empty. Is there still some OSD removal going on or something? Did you paste your current cluster status already? You could probably try starting a Squid mgr daemon by replacing the container image in the unit.run file and see how that goes. Zitat von Malte Stroem : Hello Eugen, thanks a lot. However: ceph config-key get mgr/cephadm/osd_remove_queue is empty! Damn. So should I get a new cephadm with the diff included? Best, Malte On 17.10.24 23:48, Eugen Block wrote: Save the current output to a file: ceph config-key get mgr/cephadm/osd_remove_queue > remove_queue.json Then remove the original_weight key from the json and set the modified key again with: ceph config-key set … Then fail the mgr. Zitat von Malte Stroem : Hello Frederic, Hello Eugen, yes, but I am not sure how to do it. The links says: the config-key responsible was mgr/cephadm/osd_remove_queue This is what it looked like before. After removing the original_weight field and setting the variable again, the cephadm module loads and orch works. So now: Do I remove the value of mgr/cephadm/osd_remove_queue? Or: What is meant by: "After removing the original_weight field and setting the variable again, the cephadm module loads and orch works." I can enter a MGR's container and open the file: /usr/share/ceph/mgr/cephadm/services/osd.py But what is meant by "removing the original_weight field and setting the variable again" and what JSON do you mean, Eugen? osd_obj = OSD.from_json(osd, rm_util=self.rm_util) Code looks like t
[ceph-users] Deploy custom mgr module
Is there a simple way to deploy a custom (in-house) mgr module to an orchestrator managed cluster? I assume the module code would need to be included in the mgr container image. However, there doesn't seem to be a straightforward way to do this without having the module merged to upstream ceph (not possible for a custom/in-house solution) or maintaining an in-house container repository and custom container images (a lot of extra maintenance overhead). Also, what's the best way to handle testing during development? Custom scripts to push the code into to a mgr container in a dev cluster? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Squid 19.2.0 balancer causes restful requests to be lost
I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package install on centos 9 stream. Very smooth upgrade. Only one problem so far... The MGR restful api calls work fine. EXCEPT whenever the balancer kicks in to find any new plans. During the few seconds that the balancer takes to run, all REST calls seem to be completely dropped. The MGR log file normally logs the POST requests, but the ones during these few seconds don't appear at all. This causes our monitoring to keep raising alarms. The cluster is in a completely stable state, HEALTH_OK, very little activity, just the occasional scrubs. We use the restful API for monitoring (using the Ceph for Zabbix Agent 2 plugin, as Zabbix is the over-arching monitoring platform in the data centre). I haven't yet checked the memory leak problems that we (like many) were having, because I have been chasing this new problem. The problem is quite repeatable. To diagnose I use the zabbix_get utility to query every second. Whenever the MGR log file shows the balancer kick in the REST requests time out (after 3 seconds - not sure whether the utility or the MGR is timing them out - I suspect the utility). They normally complete after a small fraction of a second. With the balancer disabled the REST interface works reliably again. The problem does not occur pre-squid. Anyone any ideas, or shall I raise a bug? Thanks, Chris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Squid 19.2.0 balancer causes restful requests to be lost
Hi, Laura posted [0],[1] two days ago that she likely found the root cause of the balancer crashing the MGR. It sounds like what you're describing could be related to that. [0] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/STR2UCS2KDZQAXOLH3GPCCWN4GBR3CJG/ [1] https://tracker.ceph.com/issues/68657 Zitat von Chris Palmer : I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package install on centos 9 stream. Very smooth upgrade. Only one problem so far... The MGR restful api calls work fine. EXCEPT whenever the balancer kicks in to find any new plans. During the few seconds that the balancer takes to run, all REST calls seem to be completely dropped. The MGR log file normally logs the POST requests, but the ones during these few seconds don't appear at all. This causes our monitoring to keep raising alarms. The cluster is in a completely stable state, HEALTH_OK, very little activity, just the occasional scrubs. We use the restful API for monitoring (using the Ceph for Zabbix Agent 2 plugin, as Zabbix is the over-arching monitoring platform in the data centre). I haven't yet checked the memory leak problems that we (like many) were having, because I have been chasing this new problem. The problem is quite repeatable. To diagnose I use the zabbix_get utility to query every second. Whenever the MGR log file shows the balancer kick in the REST requests time out (after 3 seconds - not sure whether the utility or the MGR is timing them out - I suspect the utility). They normally complete after a small fraction of a second. With the balancer disabled the REST interface works reliably again. The problem does not occur pre-squid. Anyone any ideas, or shall I raise a bug? Thanks, Chris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Assistance Required: Ceph OSD Out of Memory (OOM) Issue
Dear Ceph Community, I hope this message finds you well. I am encountering an out-of-memory (OOM) issue with one of my Ceph OSDs, which is repeatedly getting killed by the OOM killer on my system. Below are the relevant details from the log: *OOM Log*: [Wed Oct 30 13:14:48 2024] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-ceph\x2dosd.slice,task=ceph-osd,pid=6213,uid=64045 [Wed Oct 30 13:14:48 2024] Out of memory: Killed process 6213 (ceph-osd) total-vm:216486528kB, anon-rss:211821164kB, file-rss:0kB, shmem-rss:0kB, UID:64045 pgtables:418836kB oom_score_adj:0 [Wed Oct 30 13:14:58 2024] oom_reaper: reaped process 6213 (ceph-osd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB *Ceph OSD Log*: 2024-10-30T13:15:30.207+0600 7f906c74dd80 0 _get_class not permitted to load lua 2024-10-30T13:15:30.211+0600 7f906c74dd80 0 /build/ceph-15.2.17/src/cls/hello/cls_hello.cc:312: loading cls_hello 2024-10-30T13:15:30.215+0600 7f906c74dd80 0 _get_class not permitted to load kvs 2024-10-30T13:15:30.219+0600 7f906c74dd80 0 _get_class not permitted to load queue 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 /build/ceph-15.2.17/src/cls/cephfs/cls_cephfs.cc:198: loading cephfs 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 osd.13 299547 crush map has features 432629239337189376, adjusting msgr requires for clients 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 osd.13 299547 crush map has features 432629239337189376 was 8705, adjusting msgr requires for mons 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 osd.13 299547 crush map has features 3314933000854323200, adjusting msgr requires for osds 2024-10-30T13:15:30.223+0600 7f906c74dd80 1 osd.13 299547 check_osdmap_features require_osd_release unknown -> octopus 2024-10-30T13:15:31.023+0600 7f906c74dd80 0 osd.13 299547 load_pgs *Environment Details*: - Ceph Version: 15.2.17 (Octopus) - OSD: osd.13 - Kernel: Linux kernel version It seems that the OSD process is consuming a substantial amount of memory (total-vm: 216486528kB, anon-rss: 211821164kB), leading to OOM kills on the node. The OSD service restarts but continues to showing consumption excessive memory and OSD get down. Could you please provide guidance or suggestions on how to mitigate this issue? Are there any known memory management settings, configuration adjustments, or OSD-specific tuning parameters that could help prevent this from recurring? Any help would be greatly appreciated. Thank you for your time and assistance! Regards Mosharaf Hossain Manager, Product Development Bangladesh Online (BOL) Level 8, SAM Tower, Plot 4, Road 22, Gulshan 1, Dhaka 1212, Bangladesh Tel: +880 9609 000 999, +880 2 58815559, Ext: 14191, Fax: +880 2 95757 Cell: +880 1787 680828, Web: www.bol-online.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Squid 19.2.0 balancer causes restful requests to be lost
On Wed, Oct 30, 2024, 8:24 AM Chris Palmer wrote: > I've just upgraded a test cluster from 18.2.4 to 19.2.0. Package > install on centos 9 stream. Very smooth upgrade. Only one problem so far... > > The MGR restful api calls work fine. EXCEPT whenever the balancer kicks > in to find any new plans. During the few seconds that the balancer takes > to run, all REST calls seem to be completely dropped. The MGR log file > normally logs the POST requests, but the ones during these few seconds > don't appear at all. This causes our monitoring to keep raising alarms. > > The cluster is in a completely stable state, HEALTH_OK, very little > activity, just the occasional scrubs. > > We use the restful API for monitoring (using the Ceph for Zabbix Agent 2 > plugin, as Zabbix is the over-arching monitoring platform in the data > centre). I haven't yet checked the memory leak problems that we (like > many) were having, because I have been chasing this new problem. > > The problem is quite repeatable. To diagnose I use the zabbix_get > utility to query every second. Whenever the MGR log file shows the > balancer kick in the REST requests time out (after 3 seconds - not sure > whether the utility or the MGR is timing them out - I suspect the > utility). They normally complete after a small fraction of a second. > With the balancer disabled the REST interface works reliably again. > > The problem does not occur pre-squid. > > Anyone any ideas, or shall I raise a bug? > > Thanks, Chris > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io There's a (suspected) algorithmic issue wrt how upmaps are being processed as part of a Squid change. It sounds like you're hitting that. I'd suggest disabling the balancer until the issue is addressed in a subsequent Squid release. Tyler ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Deploy custom mgr module
On 10/30/24 14:58, Tim Holloway wrote: Speaking abstractly, I can see 3 possible approaches. ... 2. You can create a Dockerfile based on the stock mgr but with your extensions added. The main problem with this is that from what I can see, the cephadm tool has the names and repositories of the stock containers hard-wired in. Which ensures quality (getting the right versions) and integrity (makes it hard for a bad agent to swap in a malware module). So more information is needed at least. ... ... And, we have our answer, courtesy of John Mulligan! There are ways to compact down layers in a container image, if that's a concern. Tim On Wed, 2024-10-30 at 18:00 +, Darrell Enns wrote: Is there a simple way to deploy a custom (in-house) mgr module to an orchestrator managed cluster? I assume the module code would need to be included in the mgr container image. However, there doesn't seem to be a straightforward way to do this without having the module merged to upstream ceph (not possible for a custom/in-house solution) or maintaining an in-house container repository and custom container images (a lot of extra maintenance overhead). Also, what's the best way to handle testing during development? Custom scripts to push the code into to a mgr container in a dev cluster? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Assistance Required: Ceph OSD Out of Memory (OOM) Issue
Hi Mosharaf, read this article to identify if you are facing this issue: https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/ Regards, Joachim www.clyso.com Hohenzollernstr. 27, 80801 Munich Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677 Am Mi., 30. Okt. 2024 um 08:27 Uhr schrieb Md Mosharaf Hossain < mosharaf.hoss...@bol-online.com>: > Dear Ceph Community, > > I hope this message finds you well. > > I am encountering an out-of-memory (OOM) issue with one of my Ceph OSDs, > which is repeatedly getting killed by the OOM killer on my system. Below > are the relevant details from the log: > > *OOM Log*: > [Wed Oct 30 13:14:48 2024] > > oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-ceph\x2dosd.slice,task=ceph-osd,pid=6213,uid=64045 > [Wed Oct 30 13:14:48 2024] Out of memory: Killed process 6213 (ceph-osd) > total-vm:216486528kB, anon-rss:211821164kB, file-rss:0kB, shmem-rss:0kB, > UID:64045 pgtables:418836kB oom_score_adj:0 > [Wed Oct 30 13:14:58 2024] oom_reaper: reaped process 6213 (ceph-osd), now > anon-rss:0kB, file-rss:0kB, shmem-rss:0kB > > *Ceph OSD Log*: > > 2024-10-30T13:15:30.207+0600 7f906c74dd80 0 _get_class not permitted to > load lua > 2024-10-30T13:15:30.211+0600 7f906c74dd80 0 > /build/ceph-15.2.17/src/cls/hello/cls_hello.cc:312: loading cls_hello > 2024-10-30T13:15:30.215+0600 7f906c74dd80 0 _get_class not permitted to > load kvs > 2024-10-30T13:15:30.219+0600 7f906c74dd80 0 _get_class not permitted to > load queue > 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 > /build/ceph-15.2.17/src/cls/cephfs/cls_cephfs.cc:198: loading cephfs > 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 osd.13 299547 crush map has > features 432629239337189376, adjusting msgr requires for clients > 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 osd.13 299547 crush map has > features 432629239337189376 was 8705, adjusting msgr requires for mons > 2024-10-30T13:15:30.223+0600 7f906c74dd80 0 osd.13 299547 crush map has > features 3314933000854323200, adjusting msgr requires for osds > 2024-10-30T13:15:30.223+0600 7f906c74dd80 1 osd.13 299547 > check_osdmap_features require_osd_release unknown -> octopus > 2024-10-30T13:15:31.023+0600 7f906c74dd80 0 osd.13 299547 load_pgs > *Environment Details*: > >- Ceph Version: 15.2.17 (Octopus) >- OSD: osd.13 >- Kernel: Linux kernel version > > It seems that the OSD process is consuming a substantial amount of > memory (total-vm: > 216486528kB, anon-rss: 211821164kB), leading to OOM kills on the node. The > OSD service restarts but continues to showing consumption excessive memory > and OSD get down. > > Could you please provide guidance or suggestions on how to mitigate this > issue? Are there any known memory management settings, configuration > adjustments, or OSD-specific tuning parameters that could help prevent this > from recurring? > > Any help would be greatly appreciated. > > Thank you for your time and assistance! > > > > Regards > Mosharaf Hossain > Manager, Product Development > Bangladesh Online (BOL) > > Level 8, SAM Tower, Plot 4, Road 22, Gulshan 1, Dhaka 1212, Bangladesh > Tel: +880 9609 000 999, +880 2 58815559, Ext: 14191, Fax: +880 2 95757 > Cell: +880 1787 680828, Web: www.bol-online.com > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io