[ceph-users] Re: no recovery running

2024-10-30 Thread Alex Walender

Hey Joffrey,

try to switch back to the wpq scheduler in ceph.conf:

osd_op_queue = wpq

...and restart all OSDs.

I also had issues where recovery was very very slow (10kb/s).

Best Regards,
Alex Walender


Am 17.10.24 um 11:44 schrieb Joffrey:

HI,


This is my cluster:

   cluster:
 id: c300532c-51fa-11ec-9a41-0050569c3b55
 health: HEALTH_WARN
 Degraded data redundancy: 2062374/1331064781 objects degraded
(0.155%), 278 pgs degraded, 40 pgs undersized
 2497 pgs not deep-scrubbed in time
 2497 pgs not scrubbed in time

   services:
 mon: 3 daemons, quorum hbgt-ceph1-mon1,hbgt-ceph1-mon2,hbgt-ceph1-mon3
(age 9d)
 mgr: hbgt-ceph1-mon3.gmfzqm(active, since 10d), standbys:
hbgt-ceph1-mon2.nteihj, hbgt-ceph1-mon1.thrnnu
 osd: 96 osds: 96 up (since 9d), 96 in (since 45h); 1588 remapped pgs
 rgw: 3 daemons active (3 hosts, 2 zones)

   data:
 pools:   16 pools, 2497 pgs
 objects: 266.22M objects, 518 TiB
 usage:   976 TiB used, 808 TiB / 1.7 PiB avail
 pgs: 2062374/1331064781 objects degraded (0.155%)
  349917519/1331064781 objects misplaced (26.289%)
  1312 active+remapped+backfill_wait
  864  active+clean
  199  active+recovery_wait+degraded+remapped
  38   active+recovery_wait+degraded
  33   active+undersized+degraded+remapped+backfill_wait
  33   active+recovery_wait+remapped
  7active+recovery_wait
  6active+undersized+degraded+remapped+backfilling
  2active+recovering+remapped
  1active+remapped+backfilling
  1active+recovering+degraded+remapped
  1active+recovery_wait+undersized+degraded+remapped

   io:
 client:   683 KiB/s rd, 2.2 KiB/s wr, 51 op/s rd, 2 op/s wr


No recovery is running and I don't understand why.
I have free space:

ID   CLASS  WEIGHT  REWEIGHT  SIZE RAW USE   DATA OMAP META
  AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
  -1 1784.12231 -  1.7 PiB   976 TiB  895 TiB  298 GiB   4.1
TiB  808 TiB  54.72  1.00-  root default
  -5  208.09680 -  208 TiB   142 TiB  130 TiB   51 GiB   605
GiB   66 TiB  68.14  1.25-  host hbgt-ceph1-osd01
   1hdd17.34140   1.0   17 TiB11 TiB   11 TiB   33 KiB49
GiB  5.9 TiB  66.16  1.21  136  up  osd.1
   3hdd17.34140   1.0   17 TiB11 TiB   10 TiB   23 GiB49
GiB  6.3 TiB  63.80  1.17  139  up  osd.3
   5hdd17.34140   1.0   17 TiB13 TiB   12 TiB  139 MiB53
GiB  4.8 TiB  72.31  1.32  142  up  osd.5
   7hdd17.34140   1.0   17 TiB12 TiB   11 TiB   11 GiB51
GiB  5.6 TiB  67.97  1.24  145  up  osd.7
   9hdd17.34140   1.0   17 TiB11 TiB   10 TiB  2.2 GiB49
GiB  6.0 TiB  65.67  1.20  140  up  osd.9
  11hdd17.34140   1.0   17 TiB12 TiB   11 TiB  329 MiB50
GiB  5.5 TiB  68.42  1.25  145  up  osd.11
  13hdd17.34140   1.0   17 TiB12 TiB   11 TiB  1.5 GiB52
GiB  5.1 TiB  70.45  1.29  153  up  osd.13
  15hdd17.34140   1.0   17 TiB12 TiB   11 TiB   61 KiB48
GiB  5.7 TiB  66.85  1.22  144  up  osd.15
  17hdd17.34140   1.0   17 TiB11 TiB  9.5 TiB  272 MiB45
GiB  6.8 TiB  60.63  1.11  120  up  osd.17
  19hdd17.34140   1.0   17 TiB11 TiB   10 TiB   12 GiB50
GiB  5.9 TiB  65.90  1.20  134  up  osd.19
  21hdd17.34140   1.0   17 TiB13 TiB   12 TiB  1.6 GiB57
GiB  4.1 TiB  76.49  1.40  152  up  osd.21
  23hdd17.34140   1.0   17 TiB13 TiB   12 TiB   31 KiB54
GiB  4.7 TiB  73.10  1.34  124  up  osd.23
  -3  208.09680 -  208 TiB   146 TiB  134 TiB   64 GiB   629
GiB   62 TiB  70.05  1.28-  host hbgt-ceph1-osd02
   0hdd17.34140   1.0   17 TiB11 TiB  9.8 TiB   22 GiB49
GiB  6.6 TiB  62.07  1.13  124  up  osd.0
   2hdd17.34140   1.0   17 TiB12 TiB   11 TiB  1.7 GiB52
GiB  5.2 TiB  70.14  1.28  150  up  osd.2
   4hdd17.34140   1.0   17 TiB12 TiB   11 TiB  1.8 GiB48
GiB  5.8 TiB  66.83  1.22  152  up  osd.4
   6hdd17.34140   0.85004   17 TiB13 TiB   12 TiB   11 GiB58
GiB  4.0 TiB  76.85  1.40  153  up  osd.6
   8hdd17.34140   1.0   17 TiB12 TiB   11 TiB   11 GiB54
GiB  4.9 TiB  71.58  1.31  152  up  osd.8
  10hdd17.34140   1.0   17 TiB11 TiB   10 TiB  6.3 MiB47
GiB  6.1 TiB  64.91  1.19  133  up  osd.10
  12hdd17.34140   1.0   17 TiB12 TiB   11 TiB  109 MiB51
GiB  5.6 TiB  67.72  1.24  137  up  osd.12
  14hdd17.34140

[ceph-users] Re: Destroyed OSD clinging to wrong disk

2024-10-30 Thread Tim Holloway
Dave,

If there's one bitter lesson I learned from IBM's OS/2 OS it was that
one should never store critical information in two different
repositories. There Should Be Only One, and you may replicate it, but
at the end of the day, if you don't have a single point of Authority,
you'll suffer.

Regrettably, Ceph has issues there. Very frequently data displayed in
the Dashboard does not match data from the Ceph command line. Which to
me indicates that the information isn't always coming from the same
place.

To be clear, I'm not talking about the old /etc/ceph stuff versus the
more modern configuration database, I'm talking about cases where
apparently sometimes info comes from components (such as direct from an
OSD) and sometimes from somewhere else and they're not staying in sync.

I feel your pain. For certain versions of Ceph, it is possible to have
the same OSD defined both as administered and legacy. The administered
stuff tends to have dynamically-defined systemd units, which means you
can't simply delete the offending service file. Or even find it, unless
you know where such things live.

Go back through this list's history to about June and you'll see a lot
of wailing from me about that sort of thing and the "phantomm host"
issue, where a non-ceph host managed to insinuate itself into the mix
and took forever to expunge. I'm very grateful to Eugen for the help
there. It's possible you might find some insights if you wade through
it.

To the best of my knowledge everything relating to an OSD resides in
one of three places:

1. The /etc/ceph directory (mostly deprecated except for maybe
keyring). And of course, the FSID!

2. The Ceph configuration repository (possibly keyring, not sure if
much else).

3, The Ceph OSD directory under /var/lib/ceph. Whether legacy or
administered, the exact path may differ, but the overall layout is the
same. One directory per OSD. Everything important relating to the OSD
is there, or at least linked from there.

You haven't fully purged a defective OSD until it no longer has a
presence in either the "ceph osd tree" command, the "ceph orch ps"
command or in the OSD host's systemctl list as an "osd" service.

Which is easier said than done, but setting the unwanted OSD's weights
to 0 is a major help.

In one particular case where I had a doubly-defined OSD, I think I
ultimately cured it by turning off the OSD, deleting an OSD service
file for the legacy OSD definition from /etc/systemd/system, then
drawing a deep breath and doing a "rm -rf /var/lib/ceph/osd,xx",
leaving the /var/lib/ceph//osd,xxx alone. Followed by an OSD
restart. But do check my previously-mentioned messages to make sure
there aren't some "gotchas" that I forgot.

If you have issues with the raw data store under the OSD, then it would
take someone wiser and braver than me to repair it without first
deleting  all OSD definitions that reference it, zapping the raw data
store to remove all ceph admin and LVM info that might offend ceph,
then re-defining the OSD on the cleaned data store.

While Ceph can be a bit crotchety, I'll give it credit for one thing.
Even broken it's robust enough I've never lost or corrupted the actual
data, despite the fact that I've done an uncomfortable amount of stuff
where I'm just randomly banging on things with a hammer.

I still do backups, though. :)

Now if I could just persuade the auto-tuner to actually adjust the pg
sizes the way I told it to.

   Tim

On Tue, 2024-10-29 at 22:37 -0400, Dave Hall wrote:
> Tim,
> 
> Thank you for your guidance.  Your points are completely understood. 
> It
> was more that I couldn't figure out why the Dashboard was telling me
> that
> the destroyed OSD was still using /dev/sdi when the physical disk
> with that
> serial number was at /dev/sdc, and when another OSD was also
> reporting
> /dev/sdi.  I figured that there must be some information buried
> somewhere.
> I don't know where this metadata comes from or how it gets updated
> when
> things like 'drive letters' change, but the metadata matched what the
> dashboard showed, so now I know something new.
> 
> Regarding the process for bringing the OSD back online with a new
> HDD, I am
> still having some difficulties.  I used the steps in the
> Adding/Removing
> OSDs document under Removing the OSD, and the OSD mostly appears to
> be
> gone.  However, attempts to use 'ceph-volume lvm prepare' to build
> the
> remplacement OSD are failing,   Same thing with 'ceph orch daemon add
> osd'.
> 
> I think the problem might be that the NVMe LV that was the WAL/DB for
> the
> failed OSD did not get cleaned up, but on my systems 4 OSDs use the
> same
> NVMe drive for WAL/DB, so I'm not sure how to proceed.
> 
> Any suggestions would be welcome.
> 
> Thanks.
> 
> -Dave
> 
> --
> Dave Hall
> Binghamton University
> kdh...@binghamton.edu
> 
> 
> On Tue, Oct 29, 2024 at 3:13 PM Tim Holloway 
> wrote:
> 
> > Take care when reading the output of "ceph osd metadata". When you
> > are
> > running the OS

[ceph-users] Re: Deploy custom mgr module

2024-10-30 Thread John Mulligan
On Wednesday, October 30, 2024 2:00:56 PM EDT Darrell Enns wrote:
> Is there a simple way to deploy a custom (in-house) mgr module to an
> orchestrator managed cluster? I assume the module code would need to be
> included in the mgr container image. However, there doesn't seem to be a
> straightforward way to do this without having the module merged to upstream
> ceph (not possible for a custom/in-house solution) or maintaining an
> in-house container repository and custom container images (a lot of extra
> maintenance overhead).
> 
> Also, what's the best way to handle testing during development? Custom
> scripts to push the code into to a mgr container in a dev cluster?

There's a tool in the ceph tree designed for this: src/script/cpatch.py

(there's also an older shell based version in the same dir but that is not 
maintained WRT python changes as far as I know).

There are some downsides to how this script works, as it creates many layers, 
but it's intended for development and in these cases having extra container 
layers is not usually a big deal.  If you are working on a python based mgr 
module there's flags you can pass to have the script only create images with 
your local version of src/pybind/mgr (or whatnot, see the --help for more 
info).
If you need extra help with the script, just ask here on the list. I use it 
frequently and have been maintaining it.

When you build an image you can then push it to a private or public registry 
and configure ceph(adm) to use it. ( See https://docs.ceph.com/en/latest/
cephadm/install/#deployment-in-an-isolated-environment for some hints)






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Deploy custom mgr module

2024-10-30 Thread Joachim Kraftmayer
Build your own Image based on the ceph container Image.



Joachim Kraftmayer

CEO

joachim.kraftma...@clyso.com

www.clyso.com

Hohenzollernstr. 27, 80801 Munich

Utting a. A. | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE2754306

Darrell Enns  schrieb am Mi., 30. Okt. 2024, 19:01:

> Is there a simple way to deploy a custom (in-house) mgr module to an
> orchestrator managed cluster? I assume the module code would need to be
> included in the mgr container image. However, there doesn't seem to be a
> straightforward way to do this without having the module merged to upstream
> ceph (not possible for a custom/in-house solution) or maintaining an
> in-house container repository and custom container images (a lot of extra
> maintenance overhead).
>
> Also, what's the best way to handle testing during development? Custom
> scripts to push the code into to a mgr container in a dev cluster?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Deploy custom mgr module

2024-10-30 Thread Tim Holloway
Speaking abstractly, I can see 3 possible approaches.

1. You can create a separate container and invoke it from the mgr
container as a micro-service. As to how, I don't know. This is likely
the cleanest approach.

2. You can create a Dockerfile based on the stock mgr but with your
extensions added. The main problem with this is that from what I can
see, the cephadm tool has the names and repositories of the stock
containers hard-wired in. Which ensures quality (getting the right
versions) and integrity (makes it hard for a bad agent to swap in a
malware module). So more information is needed at least.

3. You can inject your code into the stock container image by packaging
it as an RPM, adding a local RPM repository to the stock container, and
installing the extra code something like "docker exec -it [cephadmn-
container-name] /usr/bin/dnf install mycode".

The third option does require that the infrastructure to run dnf/yum
hasn't been removed from the container image. Also not that if you're
running a dynamic container launch, you might have to deal with having
to re-install your code every time the container launches because there
would be no persistent image..

However, option 3 would, if the stars are right, be something that
Ansible could easily handle.

As for testing, I'd look at the source for the mgr module and its
regression tests. Plus of course testing your own code is something
you'd have to do yourself.

   Tom

On Wed, 2024-10-30 at 18:00 +, Darrell Enns wrote:
> Is there a simple way to deploy a custom (in-house) mgr module to an
> orchestrator managed cluster? I assume the module code would need to
> be included in the mgr container image. However, there doesn't seem
> to be a straightforward way to do this without having the module
> merged to upstream ceph (not possible for a custom/in-house solution)
> or maintaining an in-house container repository and custom container
> images (a lot of extra maintenance overhead).
> 
> Also, what's the best way to handle testing during development?
> Custom scripts to push the code into to a mgr container in a dev
> cluster?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: "ceph orch" not working anymore

2024-10-30 Thread Malte Stroem

Hello Eugen,

thanks a lot. We got our down time today to work on the cluster.

However, nothing worked. Even with Ceph 19.

All ceph orch commands do not work.

Error ENOENT: No orchestrator configured (try `ceph orch set backend`)

This has nothing to do with osd_remove_queue.

Getting back the MON quorum with three MONs and also three MGRs with 
Squid did not help at all.


I still think this can be fixed somehow. Perhaps with editing the mon 
store somehow but I don't know where.


We decided to deploy a new cluster since backups are available.

Thanks again everybody.

Best,
Malte

On 18.10.24 16:37, Eugen Block wrote:

Hi Malte,

so I would only suggest to bring up a new MGR, issue a failover to that 
MGR and see if you get the orchestrator to work again.
It should suffice to change the container_image in the unit.run file (/ 
var/lib/ceph/{FSID}/mgr.{MGR}/unit.run):


CONTAINER_IMAGE={NEWER IMAGE}

So stop one MGR, change the container image, start it and make sure it 
takes over as the active MGR.


But I would like to know if I could replace the cephadm on one 
running node, stop the MGR and deploy a new MGR on that node with this:


https://docs.ceph.com/en/latest/cephadm/troubleshooting/#manually- 
deploying-a-manager-daemon


cephadm --image  deploy --fsid  --name 
mgr.hostname.smfvfd --config-json config-json.json


This approach probably works as well, but I haven't tried that yet.


And I still do not know what places cephadm... under /var/lib/ceph/fsid.

Does that happen when I enable the orchestrator in the MGR?

And can I replace that cephadm by hand?


The orchestrator would automatically download the respective cephadm 
image into that directory if you changed the container_image config 
value(s). But I wouldn't do that because you could break your cluster. 
If for some reason a MON, OSD or some other Ceph daemon would need to be 
redeployed, you would basically upgrade it. That's why I would suggest 
to only start one single MGR daemon with a newer version to see how it 
goes. In case you get the orchestrator to work again, I would 
"downgrade" it again and see what happens next.



Zitat von Eugen Block :


I’m on a mobile phone right now, I can’t go into much detail right now.
But I don’t think it’s necessary to rebuild an entire node, just a 
mgr. otherwise you risk cluster integrity if you redeploy a mon as 
well with a newer image. I’ll respond later in more detail.


Zitat von Malte Stroem :


Well, thank you, Eugen. That is what I planned to do.

Rebuild the broken node and start a MON and a MGR there with the 
latest images. Then I will stop the other MGRs and have a look if 
it's working.


But I would like to know if I could replace the cephadm on one 
running node, stop the MGR and deploy a new MGR on that node with this:


https://docs.ceph.com/en/latest/cephadm/troubleshooting/#manually- 
deploying-a-manager-daemon


cephadm --image  deploy --fsid  --name 
mgr.hostname.smfvfd --config-json config-json.json


And I still do not know what places cephadm... under /var/lib/ceph/fsid.

Does that happen when I enable the orchestrator in the MGR?

And can I replace that cephadm by hand?

Best,
Malte

On 18.10.24 12:11, Eugen Block wrote:

Okay, then I misinterpreted your former statement:

 I think there are entries of the OSDs from the broken node we 
removed.


So the stack trace in the log points to the osd_remove_queue, but I 
don't understand why it's empty. Is there still some OSD removal 
going on or something? Did you paste your current cluster status 
already? You could probably try starting a Squid mgr daemon by 
replacing the container image in the unit.run file and see how that 
goes.


Zitat von Malte Stroem :


Hello Eugen,

thanks a lot. However:

ceph config-key get mgr/cephadm/osd_remove_queue

is empty!

Damn.

So should I get a new cephadm with the diff included?

Best,
Malte

On 17.10.24 23:48, Eugen Block wrote:

Save the current output to a file:

ceph config-key get mgr/cephadm/osd_remove_queue > remove_queue.json

Then remove the original_weight key from the json and set the 
modified key again with:

ceph config-key set …
Then fail the mgr.

Zitat von Malte Stroem :


Hello Frederic, Hello Eugen,

yes, but I am not sure how to do it.

The links says:


the config-key responsible was mgr/cephadm/osd_remove_queue

This is what it looked like before.  After removing the 
original_weight field and setting the variable again, the 
cephadm module loads and orch works.


So now: Do I remove the value of mgr/cephadm/osd_remove_queue?

Or:

What is meant by:

"After removing the original_weight field and setting the 
variable again, the cephadm module loads and orch works."


I can enter a MGR's container and open the file:

/usr/share/ceph/mgr/cephadm/services/osd.py

But what is meant by "removing the original_weight field and 
setting the variable again" and what JSON do you mean, Eugen?



osd_obj = OSD.from_json(osd, rm_util=self.rm_util)


Code looks like t

[ceph-users] Deploy custom mgr module

2024-10-30 Thread Darrell Enns
Is there a simple way to deploy a custom (in-house) mgr module to an 
orchestrator managed cluster? I assume the module code would need to be 
included in the mgr container image. However, there doesn't seem to be a 
straightforward way to do this without having the module merged to upstream 
ceph (not possible for a custom/in-house solution) or maintaining an in-house 
container repository and custom container images (a lot of extra maintenance 
overhead).

Also, what's the best way to handle testing during development? Custom scripts 
to push the code into to a mgr container in a dev cluster?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Squid 19.2.0 balancer causes restful requests to be lost

2024-10-30 Thread Chris Palmer
I've just upgraded a test cluster from 18.2.4 to 19.2.0.  Package 
install on centos 9 stream. Very smooth upgrade. Only one problem so far...


The MGR restful api calls work fine. EXCEPT whenever the balancer kicks 
in to find any new plans. During the few seconds that the balancer takes 
to run, all REST calls seem to be completely dropped. The MGR log file 
normally logs the POST requests, but the ones during these few seconds 
don't appear at all. This causes our monitoring to keep raising alarms.


The cluster is in a completely stable state, HEALTH_OK, very little 
activity, just the occasional scrubs.


We use the restful API for monitoring (using the Ceph for Zabbix Agent 2 
plugin, as Zabbix is the over-arching monitoring platform in the data 
centre). I haven't yet checked the memory leak problems that we (like 
many) were having, because I have been chasing this new problem.


The problem is quite repeatable. To diagnose I use the zabbix_get 
utility to query every second. Whenever the MGR log file shows the 
balancer kick in the REST requests time out (after 3 seconds - not sure 
whether the utility or the MGR is timing them out - I suspect the 
utility). They normally complete after a small fraction of a second. 
With the balancer disabled the REST interface works reliably again.


The problem does not occur pre-squid.

Anyone any ideas, or shall I raise a bug?

Thanks, Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Squid 19.2.0 balancer causes restful requests to be lost

2024-10-30 Thread Eugen Block

Hi,

Laura posted [0],[1] two days ago that she likely found the root cause  
of the balancer crashing the MGR. It sounds like what you're  
describing could be related to that.


[0]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/STR2UCS2KDZQAXOLH3GPCCWN4GBR3CJG/

[1] https://tracker.ceph.com/issues/68657

Zitat von Chris Palmer :

I've just upgraded a test cluster from 18.2.4 to 19.2.0.  Package  
install on centos 9 stream. Very smooth upgrade. Only one problem so  
far...


The MGR restful api calls work fine. EXCEPT whenever the balancer  
kicks in to find any new plans. During the few seconds that the  
balancer takes to run, all REST calls seem to be completely dropped.  
The MGR log file normally logs the POST requests, but the ones  
during these few seconds don't appear at all. This causes our  
monitoring to keep raising alarms.


The cluster is in a completely stable state, HEALTH_OK, very little  
activity, just the occasional scrubs.


We use the restful API for monitoring (using the Ceph for Zabbix  
Agent 2 plugin, as Zabbix is the over-arching monitoring platform in  
the data centre). I haven't yet checked the memory leak problems  
that we (like many) were having, because I have been chasing this  
new problem.


The problem is quite repeatable. To diagnose I use the zabbix_get  
utility to query every second. Whenever the MGR log file shows the  
balancer kick in the REST requests time out (after 3 seconds - not  
sure whether the utility or the MGR is timing them out - I suspect  
the utility). They normally complete after a small fraction of a  
second. With the balancer disabled the REST interface works reliably  
again.


The problem does not occur pre-squid.

Anyone any ideas, or shall I raise a bug?

Thanks, Chris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Assistance Required: Ceph OSD Out of Memory (OOM) Issue

2024-10-30 Thread Md Mosharaf Hossain
Dear Ceph Community,

I hope this message finds you well.

I am encountering an out-of-memory (OOM) issue with one of my Ceph OSDs,
which is repeatedly getting killed by the OOM killer on my system. Below
are the relevant details from the log:

*OOM Log*:
[Wed Oct 30 13:14:48 2024]
oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-ceph\x2dosd.slice,task=ceph-osd,pid=6213,uid=64045
[Wed Oct 30 13:14:48 2024] Out of memory: Killed process 6213 (ceph-osd)
total-vm:216486528kB, anon-rss:211821164kB, file-rss:0kB, shmem-rss:0kB,
UID:64045 pgtables:418836kB oom_score_adj:0
[Wed Oct 30 13:14:58 2024] oom_reaper: reaped process 6213 (ceph-osd), now
anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

*Ceph OSD Log*:

2024-10-30T13:15:30.207+0600 7f906c74dd80  0 _get_class not permitted to
load lua
2024-10-30T13:15:30.211+0600 7f906c74dd80  0 
/build/ceph-15.2.17/src/cls/hello/cls_hello.cc:312: loading cls_hello
2024-10-30T13:15:30.215+0600 7f906c74dd80  0 _get_class not permitted to
load kvs
2024-10-30T13:15:30.219+0600 7f906c74dd80  0 _get_class not permitted to
load queue
2024-10-30T13:15:30.223+0600 7f906c74dd80  0 
/build/ceph-15.2.17/src/cls/cephfs/cls_cephfs.cc:198: loading cephfs
2024-10-30T13:15:30.223+0600 7f906c74dd80  0 osd.13 299547 crush map has
features 432629239337189376, adjusting msgr requires for clients
2024-10-30T13:15:30.223+0600 7f906c74dd80  0 osd.13 299547 crush map has
features 432629239337189376 was 8705, adjusting msgr requires for mons
2024-10-30T13:15:30.223+0600 7f906c74dd80  0 osd.13 299547 crush map has
features 3314933000854323200, adjusting msgr requires for osds
2024-10-30T13:15:30.223+0600 7f906c74dd80  1 osd.13 299547
check_osdmap_features require_osd_release unknown -> octopus
2024-10-30T13:15:31.023+0600 7f906c74dd80  0 osd.13 299547 load_pgs
*Environment Details*:

   - Ceph Version: 15.2.17 (Octopus)
   - OSD: osd.13
   - Kernel: Linux kernel version

It seems that the OSD process is consuming a substantial amount of
memory (total-vm:
216486528kB, anon-rss: 211821164kB), leading to OOM kills on the node. The
OSD service restarts but continues to showing consumption excessive memory
and OSD get down.

Could you please provide guidance or suggestions on how to mitigate this
issue? Are there any known memory management settings, configuration
adjustments, or OSD-specific tuning parameters that could help prevent this
from recurring?

Any help would be greatly appreciated.

Thank you for your time and assistance!



Regards
Mosharaf Hossain
Manager, Product Development
Bangladesh Online (BOL)

Level 8, SAM Tower, Plot 4, Road 22, Gulshan 1, Dhaka 1212, Bangladesh
Tel: +880 9609 000 999, +880 2 58815559, Ext: 14191, Fax: +880 2  95757
Cell: +880 1787 680828, Web: www.bol-online.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Squid 19.2.0 balancer causes restful requests to be lost

2024-10-30 Thread Tyler Stachecki
On Wed, Oct 30, 2024, 8:24 AM Chris Palmer  wrote:

> I've just upgraded a test cluster from 18.2.4 to 19.2.0.  Package
> install on centos 9 stream. Very smooth upgrade. Only one problem so far...
>
> The MGR restful api calls work fine. EXCEPT whenever the balancer kicks
> in to find any new plans. During the few seconds that the balancer takes
> to run, all REST calls seem to be completely dropped. The MGR log file
> normally logs the POST requests, but the ones during these few seconds
> don't appear at all. This causes our monitoring to keep raising alarms.
>
> The cluster is in a completely stable state, HEALTH_OK, very little
> activity, just the occasional scrubs.
>
> We use the restful API for monitoring (using the Ceph for Zabbix Agent 2
> plugin, as Zabbix is the over-arching monitoring platform in the data
> centre). I haven't yet checked the memory leak problems that we (like
> many) were having, because I have been chasing this new problem.
>
> The problem is quite repeatable. To diagnose I use the zabbix_get
> utility to query every second. Whenever the MGR log file shows the
> balancer kick in the REST requests time out (after 3 seconds - not sure
> whether the utility or the MGR is timing them out - I suspect the
> utility). They normally complete after a small fraction of a second.
> With the balancer disabled the REST interface works reliably again.
>
> The problem does not occur pre-squid.
>
> Anyone any ideas, or shall I raise a bug?
>
> Thanks, Chris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


There's a (suspected) algorithmic issue wrt how upmaps are being processed
as part of a Squid change. It sounds like you're hitting that. I'd suggest
disabling the balancer until the issue is addressed in a subsequent Squid
release.

Tyler
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Deploy custom mgr module

2024-10-30 Thread Tim Holloway



On 10/30/24 14:58, Tim Holloway wrote:

Speaking abstractly, I can see 3 possible approaches.

...

2. You can create a Dockerfile based on the stock mgr but with your
extensions added. The main problem with this is that from what I can
see, the cephadm tool has the names and repositories of the stock
containers hard-wired in. Which ensures quality (getting the right
versions) and integrity (makes it hard for a bad agent to swap in a
malware module). So more information is needed at least.

...
... And, we have our answer, courtesy of John Mulligan! There are ways 
to compact down layers in a container image, if that's a concern.

Tim

On Wed, 2024-10-30 at 18:00 +, Darrell Enns wrote:

Is there a simple way to deploy a custom (in-house) mgr module to an
orchestrator managed cluster? I assume the module code would need to
be included in the mgr container image. However, there doesn't seem
to be a straightforward way to do this without having the module
merged to upstream ceph (not possible for a custom/in-house solution)
or maintaining an in-house container repository and custom container
images (a lot of extra maintenance overhead).

Also, what's the best way to handle testing during development?
Custom scripts to push the code into to a mgr container in a dev
cluster?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Assistance Required: Ceph OSD Out of Memory (OOM) Issue

2024-10-30 Thread Joachim Kraftmayer
Hi Mosharaf,

read this article to identify if you are facing this issue:
https://docs.clyso.com/blog/osds-with-unlimited-ram-growth/

Regards, Joachim


  www.clyso.com

  Hohenzollernstr. 27, 80801 Munich

Utting | HR: Augsburg | HRB: 25866 | USt. ID-Nr.: DE275430677



Am Mi., 30. Okt. 2024 um 08:27 Uhr schrieb Md Mosharaf Hossain <
mosharaf.hoss...@bol-online.com>:

> Dear Ceph Community,
>
> I hope this message finds you well.
>
> I am encountering an out-of-memory (OOM) issue with one of my Ceph OSDs,
> which is repeatedly getting killed by the OOM killer on my system. Below
> are the relevant details from the log:
>
> *OOM Log*:
> [Wed Oct 30 13:14:48 2024]
>
> oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/system-ceph\x2dosd.slice,task=ceph-osd,pid=6213,uid=64045
> [Wed Oct 30 13:14:48 2024] Out of memory: Killed process 6213 (ceph-osd)
> total-vm:216486528kB, anon-rss:211821164kB, file-rss:0kB, shmem-rss:0kB,
> UID:64045 pgtables:418836kB oom_score_adj:0
> [Wed Oct 30 13:14:58 2024] oom_reaper: reaped process 6213 (ceph-osd), now
> anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
>
> *Ceph OSD Log*:
>
> 2024-10-30T13:15:30.207+0600 7f906c74dd80  0 _get_class not permitted to
> load lua
> 2024-10-30T13:15:30.211+0600 7f906c74dd80  0 
> /build/ceph-15.2.17/src/cls/hello/cls_hello.cc:312: loading cls_hello
> 2024-10-30T13:15:30.215+0600 7f906c74dd80  0 _get_class not permitted to
> load kvs
> 2024-10-30T13:15:30.219+0600 7f906c74dd80  0 _get_class not permitted to
> load queue
> 2024-10-30T13:15:30.223+0600 7f906c74dd80  0 
> /build/ceph-15.2.17/src/cls/cephfs/cls_cephfs.cc:198: loading cephfs
> 2024-10-30T13:15:30.223+0600 7f906c74dd80  0 osd.13 299547 crush map has
> features 432629239337189376, adjusting msgr requires for clients
> 2024-10-30T13:15:30.223+0600 7f906c74dd80  0 osd.13 299547 crush map has
> features 432629239337189376 was 8705, adjusting msgr requires for mons
> 2024-10-30T13:15:30.223+0600 7f906c74dd80  0 osd.13 299547 crush map has
> features 3314933000854323200, adjusting msgr requires for osds
> 2024-10-30T13:15:30.223+0600 7f906c74dd80  1 osd.13 299547
> check_osdmap_features require_osd_release unknown -> octopus
> 2024-10-30T13:15:31.023+0600 7f906c74dd80  0 osd.13 299547 load_pgs
> *Environment Details*:
>
>- Ceph Version: 15.2.17 (Octopus)
>- OSD: osd.13
>- Kernel: Linux kernel version
>
> It seems that the OSD process is consuming a substantial amount of
> memory (total-vm:
> 216486528kB, anon-rss: 211821164kB), leading to OOM kills on the node. The
> OSD service restarts but continues to showing consumption excessive memory
> and OSD get down.
>
> Could you please provide guidance or suggestions on how to mitigate this
> issue? Are there any known memory management settings, configuration
> adjustments, or OSD-specific tuning parameters that could help prevent this
> from recurring?
>
> Any help would be greatly appreciated.
>
> Thank you for your time and assistance!
>
>
>
> Regards
> Mosharaf Hossain
> Manager, Product Development
> Bangladesh Online (BOL)
>
> Level 8, SAM Tower, Plot 4, Road 22, Gulshan 1, Dhaka 1212, Bangladesh
> Tel: +880 9609 000 999, +880 2 58815559, Ext: 14191, Fax: +880 2  95757
> Cell: +880 1787 680828, Web: www.bol-online.com
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io