On Wed, Oct 30, 2024, 8:24 AM Chris Palmer <chris.pal...@idnet.com> wrote:

> I've just upgraded a test cluster from 18.2.4 to 19.2.0.  Package
> install on centos 9 stream. Very smooth upgrade. Only one problem so far...
>
> The MGR restful api calls work fine. EXCEPT whenever the balancer kicks
> in to find any new plans. During the few seconds that the balancer takes
> to run, all REST calls seem to be completely dropped. The MGR log file
> normally logs the POST requests, but the ones during these few seconds
> don't appear at all. This causes our monitoring to keep raising alarms.
>
> The cluster is in a completely stable state, HEALTH_OK, very little
> activity, just the occasional scrubs.
>
> We use the restful API for monitoring (using the Ceph for Zabbix Agent 2
> plugin, as Zabbix is the over-arching monitoring platform in the data
> centre). I haven't yet checked the memory leak problems that we (like
> many) were having, because I have been chasing this new problem.
>
> The problem is quite repeatable. To diagnose I use the zabbix_get
> utility to query every second. Whenever the MGR log file shows the
> balancer kick in the REST requests time out (after 3 seconds - not sure
> whether the utility or the MGR is timing them out - I suspect the
> utility). They normally complete after a small fraction of a second.
> With the balancer disabled the REST interface works reliably again.
>
> The problem does not occur pre-squid.
>
> Anyone any ideas, or shall I raise a bug?
>
> Thanks, Chris
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


There's a (suspected) algorithmic issue wrt how upmaps are being processed
as part of a Squid change. It sounds like you're hitting that. I'd suggest
disabling the balancer until the issue is addressed in a subsequent Squid
release.

Tyler
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to