[Yahoo-eng-team] [Bug 1944947] [NEW] NUMA instance with shared CPU policy cannot be restarted after upgrade to Victoria
Public bug reported: Description === NUMA instances without hw:cpu_policy=dedicated set cannot be restarted after upgrading from Ussuri to Victoria. The nova-compute service fails with the following exception: NotImplementedError: Cannot load 'pcpuset' in the base class Steps to reproduce == * Deploy Nova using Ussuri * Upgrade Nova to Victoria * openstack server stop * openstack server start Expected result === Instance should start. Actual result = Instance doesn't start. Environment === python3-nova-22.2.2-1.el8.noarch openstack-nova-compute-22.2.2-1.el8.noarch Packaged as Docker images by Kolla. Logs & Configs == The following trace is logged: https://paste.openstack.org/show/809556/ ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1944947 Title: NUMA instance with shared CPU policy cannot be restarted after upgrade to Victoria Status in OpenStack Compute (nova): New Bug description: Description === NUMA instances without hw:cpu_policy=dedicated set cannot be restarted after upgrading from Ussuri to Victoria. The nova-compute service fails with the following exception: NotImplementedError: Cannot load 'pcpuset' in the base class Steps to reproduce == * Deploy Nova using Ussuri * Upgrade Nova to Victoria * openstack server stop * openstack server start Expected result === Instance should start. Actual result = Instance doesn't start. Environment === python3-nova-22.2.2-1.el8.noarch openstack-nova-compute-22.2.2-1.el8.noarch Packaged as Docker images by Kolla. Logs & Configs == The following trace is logged: https://paste.openstack.org/show/809556/ To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1944947/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1944948] [NEW] IPv6 slaac subnet creation causes FixedIpsSubnetsNotOnSameSegment error
Public bug reported: When tries to create IPv6 SLAAC subnet in multisegment network it raises the FixedIpsSubnetsNotOnSameSegment error. But the subnet is actually created. Steps to reproduce: $ openstack network create --share --provider-network-type geneve --provider-segment 777 test_net SEGMENT=`openstack network segment list --network test_net | awk '/777/ {print $2}'` $ openstack network segment set --name segment777 $SEGMENT $ openstack network segment create --network-type geneve --segment 778 --network test_net segment778 $ openstack subnet create --network test_net --network-segment segment777 --ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4 $ openstack subnet create --network test_net --network-segment segment778 --ip-version 4 --subnet-range 10.77.8.0/24 --dhcp segment778-v4 $ openstack subnet create --network test_net --network-segment segment777 --ip-version 6 --subnet-range 2001:10:77:7::/64 --dhcp --ipv6-address-mode slaac segment777-v6 Expected result: Subnet created with no errors Actual result: Subnet created, but API throws an exception: BadRequestException: 400: Client Error for url: http://10.0.0.105:9696/v2.0/subnets, Cannot allocate addresses from different segments. Version: - Devstack (Neutron master) - OVN 21.03 There's a Bugzilla about this topic [0] [0] https://bugzilla.redhat.com/show_bug.cgi?id=1939601 ** Affects: neutron Importance: Undecided Assignee: Elvira García Ruiz (elviragr) Status: New ** Description changed: When tries to create IPv6 SLAAC subnet in multisegment network it raises the FixedIpsSubnetsNotOnSameSegment error. But the subnet is actually created. Steps to reproduce: - - openstack network create --share --provider-network-type geneve --provider-segment 777 test_net + $ openstack network create --share --provider-network-type geneve --provider-segment 777 test_net SEGMENT=`openstack network segment list --network test_net | awk '/777/ {print $2}'` - openstack network segment set --name segment777 $SEGMENT - openstack network segment create --network-type geneve --segment 778 --network test_net segment778 - openstack subnet create --network test_net --network-segment segment777 --ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4 - openstack subnet create --network test_net --network-segment segment778 --ip-version 4 --subnet-range 10.77.8.0/24 --dhcp segment778-v4 - openstack subnet create --network test_net --network-segment segment777 --ip-version 6 --subnet-range 2001:10:77:7::/64 --dhcp --ipv6-address-mode slaac segment777-v6 + $ openstack network segment set --name segment777 $SEGMENT + $ openstack network segment create --network-type geneve --segment 778 --network test_net segment778 + $ openstack subnet create --network test_net --network-segment segment777 --ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4 + $ openstack subnet create --network test_net --network-segment segment778 --ip-version 4 --subnet-range 10.77.8.0/24 --dhcp segment778-v4 + $ openstack subnet create --network test_net --network-segment segment777 --ip-version 6 --subnet-range 2001:10:77:7::/64 --dhcp --ipv6-address-mode slaac segment777-v6 Expected result: Subnet created with no errors Actual result: Subnet created, but API throws an exception: - - BadRequestException: 400: Client Error for url: - http://10.0.0.105:9696/v2.0/subnets, Cannot allocate addresses from - different segments. + BadRequestException: 400: Client Error for url: http://10.0.0.105:9696/v2.0/subnets, Cannot allocate addresses from different segments. Version: - - Devstack (Neutron master) + - Devstack (Neutron master) - OVN 21.03 There's a Bugzilla about this topic [0] [0] https://bugzilla.redhat.com/show_bug.cgi?id=1939601 ** Changed in: neutron Assignee: (unassigned) => Elvira García Ruiz (elviragr) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1944948 Title: IPv6 slaac subnet creation causes FixedIpsSubnetsNotOnSameSegment error Status in neutron: New Bug description: When tries to create IPv6 SLAAC subnet in multisegment network it raises the FixedIpsSubnetsNotOnSameSegment error. But the subnet is actually created. Steps to reproduce: $ openstack network create --share --provider-network-type geneve --provider-segment 777 test_net SEGMENT=`openstack network segment list --network test_net | awk '/777/ {print $2}'` $ openstack network segment set --name segment777 $SEGMENT $ openstack network segment create --network-type geneve --segment 778 --network test_net segment778 $ openstack subnet create --network test_net --network-segment segment777 --ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4 $ openstack subnet create --network test_net --network-segm
[Yahoo-eng-team] [Bug 1944201] Re: neutron-openvswitch-agent crashes on start with firewall config of br-int
Reviewed: https://review.opendev.org/c/openstack/neutron/+/810592 Committed: https://opendev.org/openstack/neutron/commit/57629dc05122b01a9ba76606b8d75cce9da40776 Submitter: "Zuul (22348)" Branch:master commit 57629dc05122b01a9ba76606b8d75cce9da40776 Author: Rodolfo Alonso Hernandez Date: Thu Sep 23 09:15:09 2021 + Add retry when executing OF commands if "InvalidDatapath" When using the OF API (currently Neutron only uses native implementation via "os-ken" librarr), retry the command in case of "InvalidDatapath" exception. As commented in the related bug, some operations could restart the OF controller (set the OF procols, set the bridge fail mode). During the controller restart, a command can return a "InvalidDatapath" exception. Closes-Bug: #1944201 Change-Id: Ia8d202f8a38362272e9519c1cbd9d6ba9359e0a1 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1944201 Title: neutron-openvswitch-agent crashes on start with firewall config of br- int Status in neutron: Fix Released Bug description: In upstream CI, Ironic jobs have been encountering failures where we never find the networking to be stood up by neutron. Investigation into what was going on led us to finding the neutron-openvswitch-agent in failed state, exited due to RuntimeError, just a few seconds after the service was started. neutron-openvswitch-agent[78787]: DEBUG neutron.agent.securitygroups_rpc [None req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] Init firewall settings (driver=openvswitch) {{(pid=78787) init_firewall /opt/stack/neutron/neutron/agent/securitygroups_rpc.py:118}} neutron-openvswitch-agent[78787]: DEBUG ovsdbapp.backend.ovs_idl.transaction [-] Running txn n=1 command(idx=0): DbAddCommand(table=Bridge, record=br-int, column=protocols, values=('OpenFlow10', 'OpenFlow11', 'OpenFlow12', 'OpenFlow13', 'OpenFlow14')) {{(pid=78787) do_commit /usr/local/lib/python3.8/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py:90}} neutron-openvswitch-agent[78787]: ERROR OfctlService [-] unknown dpid 90695823979334 neutron-openvswitch-agent[78787]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1) error Datapath Invalid 90695823979334: os_ken.app.ofctl.exception.InvalidDatapath: Datapath Invalid 90695823979334 neutron-openvswitch-agent[78787]: ERROR neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1) error Datapath Invalid 90695823979334 agent terminated!: RuntimeError: ofctl request version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1) error Datapath Invalid 90695823979334 systemd[1]: devstack@q-agt.service: Main process exited, code=exited, status=1/FAILURE systemd[1]: devstack@q-agt.service: Failed with result 'exit-code'. Originally, this was thought to be related to https://bugs.launchpad.net/neutron/+bug/1817022, however this is upon service startup on a relatively low load machine where the only action really is truly just neutron starting at that time. Also, starting, the connections have not been able to exist long enough for inactivity idle triggers to occur. Investigation into allowed us to identify the general path of what is occurring, yet why we don't understand, at least in the Ironic community. init_firewall() invocation: https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/securitygroups_rpc.py#L70 Firewall class launch: https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/securitygroups_rpc.py#L121 As the default for the firewall driver ends up sending us into openvswitch's firewall code: https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/linux/openvswitch_firewall/firewall.py#L548 https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/linux/openvswitch_firewall/firewall.py#L628 Which eventually ends up in https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.
[Yahoo-eng-team] [Bug 1943029] Re: [OVN][FT] During the OvnNorthd stop, the unix file could not exist
Reviewed: https://review.opendev.org/c/openstack/neutron/+/807862 Committed: https://opendev.org/openstack/neutron/commit/1762ed883447fefbf08e11c309fc1f098374ea53 Submitter: "Zuul (22348)" Branch:master commit 1762ed883447fefbf08e11c309fc1f098374ea53 Author: Rodolfo Alonso Hernandez Date: Wed Sep 8 11:46:58 2021 + [OVN][FT] Check UNIX socket file before using it When stopping "OvnNorthd", first check if the UNIX socket file is still present before calling "ovs-appctl" command. Closes-Bug: #1943029 Change-Id: Id70df8d28a258108acf88f36b2fb59b5df3a0857 ** Changed in: neutron Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1943029 Title: [OVN][FT] During the OvnNorthd stop, the unix file could not exist Status in neutron: Fix Released Bug description: During the OVN functional tests, when the "OvnNorthd" is stopped at the end of the test, the UNIX ctl file could have been already deleted. Check first before executing the "ovs-appctl" command. E.g.: Running command: ['ovs-appctl', '-t', '/tmp/tmpo321snqk/ovnnb_db.ctl', 'exit'] Exit code: 1; Cmd: ['ovs-appctl', '-t', '/tmp/tmpo321snqk/ovnnb_db.ctl', 'exit']; Stdin: ; Stdout: ; Stderr: 2021-09-08T11:18:44Z|1|unixctl|WARN|failed to connect to /tmp/tmpo321snqk/ovnnb_db.ctl ovs-appctl: cannot connect to "/tmp/tmpo321snqk/ovnnb_db.ctl" (No such file or directory) To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1943029/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1943977] Re: os_bricks fails to rerty "iscsiadm -m session" when iser_use_multipath and iscsi_use_multipath are set in nova.conf
** Project changed: nova => os-brick -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1943977 Title: os_bricks fails to rerty "iscsiadm -m session" when iser_use_multipath and iscsi_use_multipath are set in nova.conf Status in os-brick: New Bug description: When attempting to live migrate and VM to a new compute node it fails and remains on the original compute node. Reviewing the logs I can see the iscsi session is currently not connected and needs to be brought up but it then fails with "ERROR oslo_messaging.rpc.server TargetPortalNotFound: Unable to find target portal 1.1.1.1:3260". With "iser_use_multipath" and "iscsi_use_multipath" set false in nova.conf I can see the initial os_bricks fails with "No Active sessions" as iscsid has yet to bring up the session but the second try from os_bricks then succeeds as by this time iscsid has brought the session up. With "iser_use_multipath" and "iscsi_use_multipath" set true in nova.conf I can see the initial os_bricks fails with "No Active sessions" as iscsid has yet to bring up the session but no second attempt from os_bricks leads to the "TargetPortalNotFound". I'm running "os_brick-2.5.10". should os_bricks retry when using multipath in nova.conf? To manage notifications about this bug go to: https://bugs.launchpad.net/os-brick/+bug/1943977/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1937292] Re: All overcloud VM's powered off on hypervisor when nova_libvirt is restarted
[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.] ** Changed in: nova Status: Incomplete => Expired -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1937292 Title: All overcloud VM's powered off on hypervisor when nova_libvirt is restarted Status in OpenStack Compute (nova): Expired Status in tripleo: Invalid Bug description: Description: Using TripleO. Noted that all VM's on a Hypervisor are powered off during the overcloud deployment. (I only have one Hypervisor sorry, I can't tell you if it would happen to more than one hypervisor). Seems to happen when the nova_libvirt container is restarted. Environment: TripleO - Master # podman exec -it nova_libvirt rpm -qa | grep nova python3-nova-23.1.0-0.20210625160814.1f6c351.el8.noarch openstack-nova-compute-23.1.0-0.20210625160814.1f6c351.el8.noarch openstack-nova-common-23.1.0-0.20210625160814.1f6c351.el8.noarch openstack-nova-migration-23.1.0-0.20210625160814.1f6c351.el8.noarch python3-novaclient-17.5.0-0.20210601131008.f431295.el8.noarch Reproducer: At least for me: 1. Start a VM 2. Restart tripleo_nova_libvirt.service: systemctl restart tripleo_nova_libvirt.service 3. All VM's are stopped Relevant logs: 2021-07-22 16:31:05.532 3 DEBUG nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] No wait ing events found dispatching network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb pop_instance_event /usr/lib/python3.6/site-packages/nova/compute/manager.py:319 2021-07-22 16:31:05.532 3 WARNING nova.compute.manager [req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf c4740b2aba4147adb7f101a2782003c3 - default default] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Recei ved unexpected event network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb for instance with vm_state active and task_state None. 2021-07-22 16:31:30.583 3 DEBUG nova.compute.manager [req-7be814ae-0e3d-4631-8a4c-348ead46c213 - - - - -] Triggering sync for uuid b28cc3ae-6442-40cf-9d66-9d4938a567c7 _sync_power_states /usr/lib/python3.6/site-packages/nova/compute/man ager.py:9695 2021-07-22 16:31:30.589 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager._sync_power_states.._sync..query_driver_power_state_and_sync" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359 2021-07-22 16:31:30.746 3 INFO nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] During _sync_instance_power_state the DB power_state (1) does not match the vm_power_state from the hypervisor (4). Updating power_ state in the DB to match the hypervisor. 2021-07-22 16:31:30.930 3 WARNING nova.compute.manager [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance shutdown by itself. Calling the stop API. Current vm_state: active, current task_state: None, original DB power_state: 1, current VM power_state: 4 2021-07-22 16:31:30.931 3 DEBUG nova.compute.api [-] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Going to try to stop instance force_stop /usr/lib/python3.6/site-packages/nova/compute/api.py:2584 2021-07-22 16:31:31.135 3 DEBUG oslo_concurrency.lockutils [-] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" released by "nova.compute.manager.ComputeManager._sync_power_states.._sync..query_driver_power_state_and_sync" :: held 0.547s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:371 2021-07-22 16:31:31.161 3 DEBUG oslo_concurrency.lockutils [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lock "b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by "nova.compute.manager.ComputeManager.stop_instance..do_stop_ instance" :: waited 0.000s inner /usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359 2021-07-22 16:31:31.162 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Checking state _get_power_state /usr/lib/python3.6/site-packages/nova/compute/man ager.py:1561 2021-07-22 16:31:31.165 3 DEBUG nova.compute.manager [req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: b28cc3ae-6442-40cf-9d66-9d4938a567c7] Stopping instance; current vm_state: active, current task_state: powering-off, cu rrent DB power_state: 4, current VM power_state: 4 do_stop_instance /usr/lib/python3.6/site-packages/nova/compute/manager.py:3095 2021-07-22 16:31:31.166 3 INFO nova.compute.manager [req-a8750