Okay, I did a bit of digging today for some other CI failure I saw on another change and eventually, I found this was related.
So, lemme explain the issue here. First, I was looking at https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_6f9/868236/5/gate/nova- next/6f9f3d0/ and I was wondering why the SSH connection wasn't working. When I looked at the nova logs, I found that the instance was spawned at 18:18:56 : Feb 14 18:18:56.514945 np0033093378 nova-compute[83239]: INFO nova.compute.manager [None req-053318ab-09ad-4a3a-8ddb-633cc0002c3e tempest-AttachVolumeNegativeTest-1605485622 tempest-AttachVolumeNegativeTest-1605485622-project] [instance: 6a265379-ebfd-4aea-a081-8b271f32c0ea] Took 8.58 seconds to build instance. Then, Tempest tried to ssh the instance at 18:18:59 : 2023-02-14 18:22:39.102680 | controller | 2023-02-14 18:18:59,630 92653 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.161:22' as 'cirros' with public key authentication And eventually, 2mins32sec after that (18:22:31), it stopped : 2023-02-14 18:22:39.103394 | controller | 2023-02-14 18:22:31,398 92653 ERROR [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to cirros@172.24.5.161 after 16 attempts. Proxy client: no proxy client Then, I tried to look at the guest console, and I saw that udhcpc tried 3 times : 2023-02-14 18:22:39.129636 | controller | [ 12.638156] sr 0:0:0:0: Attached scsi generic sg0 type 5 [...] 2023-02-14 18:22:39.130384 | controller | Starting network: udhcpc: started, v1.29.3 2023-02-14 18:22:39.130415 | controller | udhcpc: sending discover 2023-02-14 18:22:39.130439 | controller | udhcpc: sending discover 2023-02-14 18:22:39.130461 | controller | udhcpc: sending discover So, I was wondering how long the DHCP discovery was done and eventually, I found that cirros dhcp client actually hangs for 1 min before requesting again. So, now I'm wondering why it takes so much time to get a DHCP address and why the 2nd DHCP call doesn't get the IP address. Adding Neutron team to this bug report because maybe we have something about our DHCP controller. ** Also affects: neutron Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2006467 Title: tempest ssh timeout due to udhcpc fails in the cirros guest Status in neutron: New Status in OpenStack Compute (nova): Confirmed Bug description: Tests trying to ssh into the guest fails intermittently with timeout as udhcpc fails in the guest: 2023-02-01 20:46:32.286979 | controller | Starting network: udhcpc: started, v1.29.3 2023-02-01 20:46:32.286987 | controller | udhcp 2023-02-01 20:46:32.286996 | controller | c: sending discover 2023-02-01 20:46:32.287004 | controller | udhcpc: sending discover 2023-02-01 20:46:32.287013 | controller | udhcpc: sending discover 2023-02-01 20:46:32.287022 | controller | Usage: /sbin/cirros-dhcpc <up|down> 2023-02-01 20:46:32.287030 | controller | udhcpc: no lease, failing 2023-02-01 20:46:32.287039 | controller | FAIL Traceback (most recent call last): File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper return f(*func_args, **func_kwargs) File "/opt/stack/tempest/tempest/api/compute/admin/test_volumes_negative.py", line 128, in test_multiattach_rw_volume_update_failure server1 = self.create_test_server( File "/opt/stack/tempest/tempest/api/compute/base.py", line 272, in create_test_server body, servers = compute.create_test_server( File "/opt/stack/tempest/tempest/common/compute.py", line 334, in create_test_server with excutils.save_and_reraise_exception(): File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py", line 227, in __exit__ self.force_reraise() File "/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/oslo_utils/excutils.py", line 200, in force_reraise raise self.value File "/opt/stack/tempest/tempest/common/compute.py", line 329, in create_test_server wait_for_ssh_or_ping( File "/opt/stack/tempest/tempest/common/compute.py", line 148, in wait_for_ssh_or_ping waiters.wait_for_ssh( File "/opt/stack/tempest/tempest/common/waiters.py", line 632, in wait_for_ssh raise lib_exc.TimeoutException() tempest.lib.exceptions.TimeoutException: Request timed out Details: None Example failure https://zuul.opendev.org/t/openstack/build/f1c6b7e54b28415c952de0be833731a9/logs Signature $ logsearch log --job-group nova-devstack --result FAILURE 'udhcpc: no lease, failing' --days 7 [snip] Builds with matching logs 6/138: +----------------------------------+---------------------+----------------+----------+-----------------------------------+----------------+--------------------------------------------+ | uuid | finished | project | pipeline | review | branch | job | +----------------------------------+---------------------+----------------+----------+-----------------------------------+----------------+--------------------------------------------+ | 9bd5d568bfa84c119470df9fbff2de0b | 2023-02-03T12:36:54 | openstack/nova | check | https://review.opendev.org/857339 | master | nova-next | | 3fae6edffe68483fa2627bc40002f524 | 2023-02-02T13:52:04 | openstack/nova | check | https://review.opendev.org/860285 | master | nova-next | | 70eeeb8eb3184d8d9ee802ee53cb979b | 2023-02-02T13:33:57 | openstack/nova | check | https://review.opendev.org/860287 | master | nova-next | | 492821b715974ae389c5d7f9127bb5c3 | 2023-02-02T05:14:11 | openstack/nova | check | https://review.opendev.org/871798 | stable/wallaby | tempest-integrated-compute-centos-8-stream | | f1c6b7e54b28415c952de0be833731a9 | 2023-02-01T21:34:36 | openstack/nova | gate | https://review.opendev.org/872220 | master | nova-next | | cca45d74a56f4204a299ee4bbbaad59d | 2023-02-01T06:17:04 | openstack/nova | check | https://review.opendev.org/871557 | stable/wallaby | tempest-integrated-compute-centos-8-stream | +----------------------------------+---------------------+----------------+----------+-----------------------------------+----------------+--------------------------------------------+ To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2006467/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp