Hi Tiago, After a little more digging we were able to find and resolve the problem.
We had a single ha_chassis group which had a duplicate chassis member (not sure how that happened), this was causing an update to loop forever. We used ovn-nbctl lrp-del-gateway-chassis to remove both entries, and then re-added a corrected entry. As soon as this completed the high CPU went away and the service became stable again. Rgds Steve. On Mon, 17 Mar 2025 at 21:27, Tiago Pires <tiago.pi...@luizalabs.com> wrote: > Hi Steven, > > Can you enable the debug on the northd (ovn-appctl -t ovn-northd > vlog/set file:dbg)? You can check in the moment that there is a cpu > load spike, what is being done between them (NB <=> Neutron), probably > you will see json rpc messages. > > Regards, > > Tiago Pires > > On Sun, Mar 16, 2025 at 3:21 PM Steven Relf via discuss > <ovs-discuss@openvswitch.org> wrote: > > > > Hi List, > > > > Sorry for the cross post, I accidentally posted this to the bug list as > well. > > > > Looking for some help troubleshooting an issue I'm seeing as part of an > openstack install of OVN. > > > > Environment > > Openstack: 2023.2 > > ovn 23.09.3 > > > > Symptoms: > > ovn-northd is sitting at about 80/90% cpu usage, with no apparent cause. > The log is showing the following > > > > 2025-03-16T09:41:31.909Z|01411|poll_loop|INFO|wakeup due to 0-ms timeout > at lib/reconnect.c:677 (82% CPU usage) > > 2025-03-16T09:41:37.931Z|01412|poll_loop|INFO|Dropped 244 log messages > in last 6 seconds (most recently, 0 seconds ago) due to excessive rate > > 2025-03-16T09:41:37.931Z|01413|poll_loop|INFO|wakeup due to [POLLIN] on > fd 12 (10.20.3.5:48108<->10.20.3.7:6642) at lib/stream-fd.c:157 (85% CPU > usage) > > > > When running an strace against its pid > > > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > write(9, "\0", 1) = 1 > > sendto(12, "{\"id\":126601,\"method\":\"transact\""..., 1562, 0, NULL, > 0) = 1562 > > accept(7, 0x7ffc21c15160, [128]) = -1 EAGAIN (Resource > temporarily unavailable) > > write(9, "\0", 1) = 1 > > poll([{fd=15, events=POLLIN}, {fd=7, events=POLLIN}, {fd=5, > events=POLLIN}, {fd=12, events=POLLIN}], 4, 1546) = 1 ([{fd=12, > revents=POLLIN}]) > > getrusage(RUSAGE_THREAD, {ru_utime={tv_sec=2357, tv_usec=89555}, > ru_stime={tv_sec=62, tv_usec=784079}, ...}) = 0 > > write(9, "\0", 1) = 1 > > recvfrom(15, 0x61091006da4a, 1446, 0, NULL, NULL) = -1 EAGAIN (Resource > temporarily unavailable) > > recvfrom(15, 0x61091006da4a, 1446, 0, NULL, NULL) = -1 EAGAIN (Resource > temporarily unavailable) > > recvfrom(12, "{\"id\":null,\"method\":\"update2\",\"p"..., 1564, 0, > NULL, NULL) = 1564 > > recvfrom(12, "ce416b4-d662-43e5-863d-06ccc1152"..., 4096, 0, NULL, NULL) > = 388 > > recvfrom(12, 0x610911862914, 3708, 0, NULL, NULL) = -1 EAGAIN (Resource > temporarily unavailable) > > recvfrom(12, 0x610911862914, 3708, 0, NULL, NULL) = -1 EAGAIN (Resource > temporarily unavailable) > > recvfrom(12, 0x610911862914, 3708, 0, NULL, NULL) = -1 EAGAIN (Resource > temporarily unavailable) > > > > So I'm not 100% sure that it's an ovn issue, or that it's a > neutron-server calling it too many times. > > > > The odd thing is this is a lab environment with very little traffic or > change taking place. > > > > Any suggestions on troubleshooting or narrowing down the cause would be > gratefully received. > > > > Rgds > > Steve. > > > > This email contains information, which is private and confidential, and > is intended for the person(s) named above. All commercial rights to the > content included herein are owned exclusively by Nscale Global Holdings > Limited or its affiliates (collectively, "Nscale"). Any use, distribution, > copying, or disclosure by any other person without the prior written > permission of Nscale is strictly prohibited. If you have received this > email in error or you do not consent to receiving messages of this kind, > then please inform me as soon as possible. > > _______________________________________________ > > discuss mailing list > > disc...@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-discuss > > -- > > > > > _‘Esta mensagem é direcionada apenas para os endereços constantes no > cabeçalho inicial. Se você não está listado nos endereços constantes no > cabeçalho, pedimos-lhe que desconsidere completamente o conteúdo dessa > mensagem e cuja cópia, encaminhamento e/ou execução das ações citadas > estão > imediatamente anuladas e proibidas’._ > > > * **‘Apesar do Magazine Luiza tomar > todas as precauções razoáveis para assegurar que nenhum vírus esteja > presente nesse e-mail, a empresa não poderá aceitar a responsabilidade por > quaisquer perdas ou danos causados por esse e-mail ou por seus anexos’.* > > > > -- This email contains information, which is private and confidential, and is intended for the person(s) named above. All commercial rights to the content included herein are owned exclusively by Nscale Global Holdings Limited or its affiliates (collectively, "Nscale"). Any use, distribution, copying, or disclosure by any other person without the prior written permission of Nscale is strictly prohibited. If you have received this email in error or you do not consent to receiving messages of this kind, then please inform me as soon as possible.
_______________________________________________ discuss mailing list disc...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-discuss