Hi, and thanks for reporting this issue. Based on the information available, it looks like this bug has already been resolved. I’m proposing to mark it as Fix Released.
If you believe the issue still persists or was not correctly addressed, please feel free to set the bug status back to New and provide additional details. Thanks! ** Changed in: nova Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2092297 Title: Nova-compute service state flapping down/up Status in OpenStack Compute (nova): Fix Released Bug description: We're running Antelope (2023.1) on two environments test and production. Issue manifests on both. Nova-compute service state started to fail - or actually flap down/up in seemingly random (but often and shortening) intervals complaining about Rabbit connectivity. No other services have issue with Rabbit. Once nova-compute container is restarted on specific compute host the issue seem to be solved, however after few days it starts reoccurring more often and often, gradually shortening the interval. When the service get's down we observed following log sequence - communication between nova and rabbit, filtered for specific id to trace single thread: Initial connection to rabbit: Nov 29, 2024 @ 11:12:18.471 info controller-1 rabbit <0.17767.1487> connection <0.17767.1487> (172.16.4.52:33664 -> 172.16.4.22:5672) has a client-provided name: nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63 Nov 29, 2024 @ 11:12:18.471 info controller-1 rabbit <0.17767.1487> connection <0.17767.1487> (172.16.4.52:33664 -> 172.16.4.22:5672 - nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): user 'openstack' authenticated and granted access to vhost '/' 11 days of silence. First occurence: Dec 10, 2024 @ 13:16:42.395 info controller-1 rabbit <0.16505.2454> connection <0.16505.2454> (172.16.4.52:34228 -> 172.16.4.22:5672 - nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): user 'openstack' authenticated and granted access to vhost '/' Dec 10, 2024 @ 13:16:40.775 info controller-1 rabbit <0.16505.2454> connection <0.16505.2454> (172.16.4.52:34228 -> 172.16.4.22:5672) has a client-provided name: nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63 Dec 10, 2024 @ 13:16:40.681 INFO compute-36 nova-compute [671a3304-8303-4e01-a1ab-3990e1869a63] Reconnected to AMQP server on 172.16.4.22:5672 via [amqp] client with port 34228. Dec 10, 2024 @ 13:16:39.669 ERROR compute-36 nova-compute [671a3304-8303-4e01-a1ab-3990e1869a63] AMQP server on 172.16.4.22:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error> Dec 10, 2024 @ 13:16:34.610 error controller-1 rabbit <0.17767.1487> closing AMQP connection <0.17767.1487> (172.16.4.52:33664 -> 172.16.4.22:5672 - nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): Second occurence 5 hours later, after that interval shortens. Dec 10, 2024 @ 18:03:43.279 info controller-1 rabbit <0.31431.2484> connection <0.31431.2484> (172.16.4.52:42630 -> 172.16.4.22:5672 - nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): user 'openstack' authenticated and granted access to vhost '/' Dec 10, 2024 @ 18:03:42.639 info controller-1 rabbit <0.31431.2484> connection <0.31431.2484> (172.16.4.52:42630 -> 172.16.4.22:5672) has a client-provided name: nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63 Dec 10, 2024 @ 18:03:42.545 INFO compute-36 nova-compute [671a3304-8303-4e01-a1ab-3990e1869a63] Reconnected to AMQP server on 172.16.4.22:5672 via [amqp] client with port 42630. Dec 10, 2024 @ 18:03:41.534 ERROR compute-36 nova-compute [671a3304-8303-4e01-a1ab-3990e1869a63] AMQP server on 172.16.4.22:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: amqp.exceptions.RecoverableConnectionError: <RecoverableConnectionError: unknown error> Dec 10, 2024 @ 18:03:41.063 error controller-1 rabbit <0.16505.2454> closing AMQP connection <0.16505.2454> (172.16.4.52:34228 -> 172.16.4.22:5672 - nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): This id seem to error till today. 700+ log entries today, each time connection is closed, nova complains about rabbit being unreachable and after few attempts reconnects. Nova container build: 2023.1 commit 47428f6caf503b94583dac614b59971f60a0ba9c Rabbit version: 3.11.28 on Erlang 25.3.2.12 Hypervisor: libvirt + kvm Storage: ceph Network: OVN To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2092297/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp