Hi, and thanks for reporting this issue.

Based on the information available, it looks like this bug has already
been resolved. I’m proposing to mark it as Fix Released.

If you believe the issue still persists or was not correctly addressed,
please feel free to set the bug status back to New and provide
additional details.

Thanks!

** Changed in: nova
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2092297

Title:
  Nova-compute service state flapping down/up

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  We're running Antelope (2023.1) on two environments test and production.
  Issue manifests on both.

  Nova-compute service state started to fail - or actually flap down/up
  in seemingly random (but often and shortening) intervals complaining
  about Rabbit connectivity. No other services have issue with Rabbit.
  Once nova-compute container is restarted on specific compute host the
  issue seem to be solved, however after few days it starts reoccurring
  more often and often, gradually shortening the interval.

  When the service get's down we observed following log sequence -
  communication between nova and rabbit, filtered for specific id to
  trace single thread:

  Initial connection to rabbit:

  Nov 29, 2024 @ 11:12:18.471 info controller-1 rabbit <0.17767.1487> 
connection <0.17767.1487> (172.16.4.52:33664 -> 172.16.4.22:5672) has a 
client-provided name: nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63
  Nov 29, 2024 @ 11:12:18.471 info controller-1 rabbit <0.17767.1487> 
connection <0.17767.1487> (172.16.4.52:33664 -> 172.16.4.22:5672 - 
nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): user 'openstack' 
authenticated and granted access to vhost '/'

  11 days of silence.

  First occurence:

  Dec 10, 2024 @ 13:16:42.395 info controller-1 rabbit <0.16505.2454> 
connection <0.16505.2454> (172.16.4.52:34228 -> 172.16.4.22:5672 - 
nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): user 'openstack' 
authenticated and granted access to vhost '/'
  Dec 10, 2024 @ 13:16:40.775 info controller-1 rabbit <0.16505.2454> 
connection <0.16505.2454> (172.16.4.52:34228 -> 172.16.4.22:5672) has a 
client-provided name: nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63
  Dec 10, 2024 @ 13:16:40.681 INFO compute-36 nova-compute 
[671a3304-8303-4e01-a1ab-3990e1869a63] Reconnected to AMQP server on 
172.16.4.22:5672 via [amqp] client with port 34228.
  Dec 10, 2024 @ 13:16:39.669 ERROR compute-36 nova-compute 
[671a3304-8303-4e01-a1ab-3990e1869a63] AMQP server on 172.16.4.22:5672 is 
unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 
seconds.: amqp.exceptions.RecoverableConnectionError: 
<RecoverableConnectionError: unknown error>
  Dec 10, 2024 @ 13:16:34.610 error controller-1 rabbit <0.17767.1487> closing 
AMQP connection <0.17767.1487> (172.16.4.52:33664 -> 172.16.4.22:5672 - 
nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63):

  Second occurence 5 hours later, after that interval shortens.

  Dec 10, 2024 @ 18:03:43.279 info controller-1 rabbit <0.31431.2484> 
connection <0.31431.2484> (172.16.4.52:42630 -> 172.16.4.22:5672 - 
nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63): user 'openstack' 
authenticated and granted access to vhost '/'
  Dec 10, 2024 @ 18:03:42.639 info controller-1 rabbit <0.31431.2484> 
connection <0.31431.2484> (172.16.4.52:42630 -> 172.16.4.22:5672) has a 
client-provided name: nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63
  Dec 10, 2024 @ 18:03:42.545 INFO compute-36 nova-compute 
[671a3304-8303-4e01-a1ab-3990e1869a63] Reconnected to AMQP server on 
172.16.4.22:5672 via [amqp] client with port 42630.
  Dec 10, 2024 @ 18:03:41.534 ERROR compute-36 nova-compute 
[671a3304-8303-4e01-a1ab-3990e1869a63] AMQP server on 172.16.4.22:5672 is 
unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 
seconds.: amqp.exceptions.RecoverableConnectionError: 
<RecoverableConnectionError: unknown error>
  Dec 10, 2024 @ 18:03:41.063 error controller-1 rabbit <0.16505.2454> closing 
AMQP connection <0.16505.2454> (172.16.4.52:34228 -> 172.16.4.22:5672 - 
nova-compute:7:671a3304-8303-4e01-a1ab-3990e1869a63):

  This id seem to error till today. 700+ log entries today, each time
  connection is closed, nova complains about rabbit being unreachable
  and after few attempts reconnects.

  Nova container build: 2023.1 commit 47428f6caf503b94583dac614b59971f60a0ba9c
  Rabbit version: 3.11.28 on Erlang 25.3.2.12
  Hypervisor: libvirt + kvm
  Storage: ceph
  Network: OVN

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2092297/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to