Hi Tiago,

After a little more digging we were able to find and resolve the problem.

We had a single ha_chassis group which had a duplicate chassis member (not
sure how that happened), this was causing an update to loop forever.

We used ovn-nbctl lrp-del-gateway-chassis to remove both entries, and then
re-added a corrected entry.

As soon as this completed the high CPU went away and the service became
stable again.

Rgds
Steve.


On Mon, 17 Mar 2025 at 21:27, Tiago Pires <tiago.pi...@luizalabs.com> wrote:

> Hi Steven,
>
> Can you enable the debug on the northd (ovn-appctl -t ovn-northd
> vlog/set file:dbg)? You can check in the moment that there is a cpu
> load spike, what is being done between them (NB <=> Neutron), probably
> you will see json rpc messages.
>
> Regards,
>
> Tiago Pires
>
> On Sun, Mar 16, 2025 at 3:21 PM Steven Relf via discuss
> <ovs-discuss@openvswitch.org> wrote:
> >
> > Hi List,
> >
> > Sorry for the cross post, I accidentally posted this to the bug list as
> well.
> >
> > Looking for some help troubleshooting an issue I'm seeing as part of an
> openstack install of OVN.
> >
> > Environment
> > Openstack: 2023.2
> > ovn 23.09.3
> >
> > Symptoms:
> > ovn-northd is sitting at about 80/90% cpu usage, with no apparent cause.
> The log is showing the following
> >
> > 2025-03-16T09:41:31.909Z|01411|poll_loop|INFO|wakeup due to 0-ms timeout
> at lib/reconnect.c:677 (82% CPU usage)
> > 2025-03-16T09:41:37.931Z|01412|poll_loop|INFO|Dropped 244 log messages
> in last 6 seconds (most recently, 0 seconds ago) due to excessive rate
> > 2025-03-16T09:41:37.931Z|01413|poll_loop|INFO|wakeup due to [POLLIN] on
> fd 12 (10.20.3.5:48108<->10.20.3.7:6642) at lib/stream-fd.c:157 (85% CPU
> usage)
> >
> > When running an strace against its pid
> >
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > write(9, "\0", 1)                       = 1
> > sendto(12, "{\"id\":126601,\"method\":\"transact\""..., 1562, 0, NULL,
> 0) = 1562
> > accept(7, 0x7ffc21c15160, [128])        = -1 EAGAIN (Resource
> temporarily unavailable)
> > write(9, "\0", 1)                       = 1
> > poll([{fd=15, events=POLLIN}, {fd=7, events=POLLIN}, {fd=5,
> events=POLLIN}, {fd=12, events=POLLIN}], 4, 1546) = 1 ([{fd=12,
> revents=POLLIN}])
> > getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=2357, tv_usec=89555},
> ru_stime={tv_sec=62, tv_usec=784079}, ...}) = 0
> > write(9, "\0", 1)                       = 1
> > recvfrom(15, 0x61091006da4a, 1446, 0, NULL, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> > recvfrom(15, 0x61091006da4a, 1446, 0, NULL, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> > recvfrom(12, "{\"id\":null,\"method\":\"update2\",\"p"..., 1564, 0,
> NULL, NULL) = 1564
> > recvfrom(12, "ce416b4-d662-43e5-863d-06ccc1152"..., 4096, 0, NULL, NULL)
> = 388
> > recvfrom(12, 0x610911862914, 3708, 0, NULL, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> > recvfrom(12, 0x610911862914, 3708, 0, NULL, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> > recvfrom(12, 0x610911862914, 3708, 0, NULL, NULL) = -1 EAGAIN (Resource
> temporarily unavailable)
> >
> > So I'm not 100% sure that it's an ovn issue, or that it's a
> neutron-server calling it too many times.
> >
> > The odd thing is this is a lab environment with very little traffic or
> change taking place.
> >
> > Any suggestions on troubleshooting or narrowing down the cause would be
> gratefully received.
> >
> > Rgds
> > Steve.
> >
> > This email contains information, which is private and confidential, and
> is intended for the person(s) named above. All commercial rights to the
> content included herein are owned exclusively by Nscale Global Holdings
> Limited or its affiliates (collectively, "Nscale"). Any use, distribution,
> copying, or disclosure by any other person without the prior written
> permission of Nscale is strictly prohibited. If you have received this
> email in error or you do not consent to receiving messages of this kind,
> then please inform me as soon as possible.
> > _______________________________________________
> > discuss mailing list
> > disc...@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss
>
> --
>
>
>
>
> _‘Esta mensagem é direcionada apenas para os endereços constantes no
> cabeçalho inicial. Se você não está listado nos endereços constantes no
> cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa
> mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas
> estão
> imediatamente anuladas e proibidas’._
>
>
> * **‘Apesar do Magazine Luiza tomar
> todas as precauções razoáveis para assegurar que nenhum vírus esteja
> presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por
> quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.*
>
>
>
>

-- 
This email contains information, which is private and confidential, and is 
intended for the person(s) named above. All commercial rights to the 
content included herein are owned exclusively by Nscale Global Holdings 
Limited or its affiliates (collectively, "Nscale"). Any use, distribution, 
copying, or disclosure by any other person without the prior written 
permission of Nscale is strictly prohibited. If you have received this 
email in error or you do not consent to receiving messages of this kind, 
then please inform me as soon as possible.
_______________________________________________
discuss mailing list
disc...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-discuss

Reply via email to