[Yahoo-eng-team] [Bug 1944947] [NEW] NUMA instance with shared CPU policy cannot be restarted after upgrade to Victoria

2021-09-24 Thread Pierre Riteau
Public bug reported:

Description
===
NUMA instances without hw:cpu_policy=dedicated set cannot be restarted after 
upgrading from Ussuri to Victoria. The nova-compute service fails with the 
following exception:

NotImplementedError: Cannot load 'pcpuset' in the base class

Steps to reproduce
==
* Deploy Nova using Ussuri
* Upgrade Nova to Victoria
* openstack server stop 
* openstack server start 

Expected result
===
Instance should start.

Actual result
=
Instance doesn't start.

Environment
===
python3-nova-22.2.2-1.el8.noarch
openstack-nova-compute-22.2.2-1.el8.noarch

Packaged as Docker images by Kolla.

Logs & Configs
==
The following trace is logged: https://paste.openstack.org/show/809556/

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1944947

Title:
  NUMA instance with shared CPU policy cannot be restarted after upgrade
  to Victoria

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===
  NUMA instances without hw:cpu_policy=dedicated set cannot be restarted after 
upgrading from Ussuri to Victoria. The nova-compute service fails with the 
following exception:

  NotImplementedError: Cannot load 'pcpuset' in the base class

  Steps to reproduce
  ==
  * Deploy Nova using Ussuri
  * Upgrade Nova to Victoria
  * openstack server stop 
  * openstack server start 

  Expected result
  ===
  Instance should start.

  Actual result
  =
  Instance doesn't start.

  Environment
  ===
  python3-nova-22.2.2-1.el8.noarch
  openstack-nova-compute-22.2.2-1.el8.noarch

  Packaged as Docker images by Kolla.

  Logs & Configs
  ==
  The following trace is logged: https://paste.openstack.org/show/809556/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1944947/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1944948] [NEW] IPv6 slaac subnet creation causes FixedIpsSubnetsNotOnSameSegment error

2021-09-24 Thread Elvira García Ruiz
Public bug reported:

When tries to create IPv6 SLAAC subnet in multisegment network it raises
the FixedIpsSubnetsNotOnSameSegment error. But the subnet is actually
created.

Steps to reproduce:
$ openstack network create --share --provider-network-type geneve 
--provider-segment 777 test_net
SEGMENT=`openstack network segment list --network test_net | awk '/777/ {print 
$2}'`
$ openstack network segment set --name segment777 $SEGMENT
$ openstack network segment create --network-type geneve --segment 778 
--network test_net segment778
$ openstack subnet create --network test_net --network-segment segment777 
--ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4
$ openstack subnet create --network test_net --network-segment segment778 
--ip-version 4 --subnet-range 10.77.8.0/24 --dhcp segment778-v4
$ openstack subnet create --network test_net --network-segment segment777 
--ip-version 6 --subnet-range 2001:10:77:7::/64 --dhcp --ipv6-address-mode 
slaac segment777-v6

Expected result:
Subnet created with no errors

Actual result:
Subnet created, but API throws an exception:
BadRequestException: 400: Client Error for url: 
http://10.0.0.105:9696/v2.0/subnets, Cannot allocate addresses from different 
segments.

Version:
- Devstack (Neutron master)
- OVN 21.03

There's a Bugzilla about this topic [0]

[0] https://bugzilla.redhat.com/show_bug.cgi?id=1939601

** Affects: neutron
 Importance: Undecided
 Assignee: Elvira García Ruiz (elviragr)
 Status: New

** Description changed:

  When tries to create IPv6 SLAAC subnet in multisegment network it raises
  the FixedIpsSubnetsNotOnSameSegment error. But the subnet is actually
  created.
  
  Steps to reproduce:
- 
- openstack network create --share --provider-network-type geneve 
--provider-segment 777 test_net
+ $ openstack network create --share --provider-network-type geneve 
--provider-segment 777 test_net
  SEGMENT=`openstack network segment list --network test_net | awk '/777/ 
{print $2}'`
- openstack network segment set --name segment777 $SEGMENT
- openstack network segment create --network-type geneve --segment 778 
--network test_net segment778
- openstack subnet create --network test_net --network-segment segment777 
--ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4 
   
- openstack subnet create --network test_net --network-segment segment778 
--ip-version 4 --subnet-range 10.77.8.0/24 --dhcp segment778-v4
- openstack subnet create --network test_net --network-segment segment777 
--ip-version 6 --subnet-range 2001:10:77:7::/64 --dhcp --ipv6-address-mode 
slaac segment777-v6 
+ $ openstack network segment set --name segment777 $SEGMENT
+ $ openstack network segment create --network-type geneve --segment 778 
--network test_net segment778
+ $ openstack subnet create --network test_net --network-segment segment777 
--ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4
+ $ openstack subnet create --network test_net --network-segment segment778 
--ip-version 4 --subnet-range 10.77.8.0/24 --dhcp segment778-v4
+ $ openstack subnet create --network test_net --network-segment segment777 
--ip-version 6 --subnet-range 2001:10:77:7::/64 --dhcp --ipv6-address-mode 
slaac segment777-v6
  
  Expected result:
  Subnet created with no errors
  
  Actual result:
  Subnet created, but API throws an exception:
- 
- BadRequestException: 400: Client Error for url:
- http://10.0.0.105:9696/v2.0/subnets, Cannot allocate addresses from
- different segments.
+ BadRequestException: 400: Client Error for url: 
http://10.0.0.105:9696/v2.0/subnets, Cannot allocate addresses from different 
segments.
  
  Version:
- - Devstack (Neutron master) 
+ - Devstack (Neutron master)
  - OVN 21.03
  
  There's a Bugzilla about this topic [0]
  
  [0] https://bugzilla.redhat.com/show_bug.cgi?id=1939601

** Changed in: neutron
 Assignee: (unassigned) => Elvira García Ruiz (elviragr)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1944948

Title:
  IPv6 slaac subnet creation causes FixedIpsSubnetsNotOnSameSegment
  error

Status in neutron:
  New

Bug description:
  When tries to create IPv6 SLAAC subnet in multisegment network it
  raises the FixedIpsSubnetsNotOnSameSegment error. But the subnet is
  actually created.

  Steps to reproduce:
  $ openstack network create --share --provider-network-type geneve 
--provider-segment 777 test_net
  SEGMENT=`openstack network segment list --network test_net | awk '/777/ 
{print $2}'`
  $ openstack network segment set --name segment777 $SEGMENT
  $ openstack network segment create --network-type geneve --segment 778 
--network test_net segment778
  $ openstack subnet create --network test_net --network-segment segment777 
--ip-version 4 --subnet-range 10.77.7.0/24 --dhcp segment777-v4
  $ openstack subnet create --network test_net --network-segm

[Yahoo-eng-team] [Bug 1944201] Re: neutron-openvswitch-agent crashes on start with firewall config of br-int

2021-09-24 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/810592
Committed: 
https://opendev.org/openstack/neutron/commit/57629dc05122b01a9ba76606b8d75cce9da40776
Submitter: "Zuul (22348)"
Branch:master

commit 57629dc05122b01a9ba76606b8d75cce9da40776
Author: Rodolfo Alonso Hernandez 
Date:   Thu Sep 23 09:15:09 2021 +

Add retry when executing OF commands if "InvalidDatapath"

When using the OF API (currently Neutron only uses native
implementation via "os-ken" librarr), retry the command in case of
"InvalidDatapath" exception.

As commented in the related bug, some operations could restart the
OF controller (set the OF procols, set the bridge fail mode). During
the controller restart, a command can return a "InvalidDatapath"
exception.

Closes-Bug: #1944201
Change-Id: Ia8d202f8a38362272e9519c1cbd9d6ba9359e0a1


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1944201

Title:
  neutron-openvswitch-agent crashes on start with firewall config of br-
  int

Status in neutron:
  Fix Released

Bug description:
  In upstream CI, Ironic jobs have been encountering failures where we
  never find the networking to be stood up by neutron. Investigation
  into what was going on led us to finding the neutron-openvswitch-agent
  in failed state, exited due to RuntimeError, just a few seconds after
  the service was started.

  neutron-openvswitch-agent[78787]: DEBUG neutron.agent.securitygroups_rpc 
[None req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] Init firewall 
settings (driver=openvswitch) {{(pid=78787) init_firewall 
/opt/stack/neutron/neutron/agent/securitygroups_rpc.py:118}}
  neutron-openvswitch-agent[78787]: DEBUG ovsdbapp.backend.ovs_idl.transaction 
[-] Running txn n=1 command(idx=0): DbAddCommand(table=Bridge, record=br-int, 
column=protocols, values=('OpenFlow10', 'OpenFlow11', 'OpenFlow12', 
'OpenFlow13', 'OpenFlow14')) {{(pid=78787) do_commit 
/usr/local/lib/python3.8/dist-packages/ovsdbapp/backend/ovs_idl/transaction.py:90}}
  neutron-openvswitch-agent[78787]: ERROR OfctlService [-] unknown dpid 
90695823979334
  neutron-openvswitch-agent[78787]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.openflow.native.ofswitch [None 
req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] ofctl request 
version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1)
 error Datapath Invalid 90695823979334: 
os_ken.app.ofctl.exception.InvalidDatapath: Datapath Invalid 90695823979334
  neutron-openvswitch-agent[78787]: ERROR 
neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [None 
req-b18a79b7-7258-44f0-9a69-fa92a490bc26 None None] ofctl request 
version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1)
 error Datapath Invalid 90695823979334 agent terminated!: RuntimeError: ofctl 
request 
version=None,msg_type=None,msg_len=None,xid=None,OFPFlowStatsRequest(cookie=0,cookie_mask=0,flags=0,match=OFPMatch(oxm_fields={}),out_group=4294967295,out_port=4294967295,table_id=71,type=1)
 error Datapath Invalid 90695823979334
  systemd[1]: devstack@q-agt.service: Main process exited, code=exited, 
status=1/FAILURE
  systemd[1]: devstack@q-agt.service: Failed with result 'exit-code'.

  Originally, this was thought to be related to
  https://bugs.launchpad.net/neutron/+bug/1817022, however this is upon
  service startup on a relatively low load machine where the only action
  really is truly just neutron starting at that time. Also, starting,
  the connections have not been able to exist long enough for inactivity
  idle triggers to occur.

  Investigation into allowed us to identify the general path of what is
  occurring, yet why we don't understand, at least in the Ironic
  community.

  init_firewall() invocation: 
https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/securitygroups_rpc.py#L70
  Firewall class launch: 
https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/securitygroups_rpc.py#L121

  As the default for the firewall driver ends up sending us into
  openvswitch's firewall code:

  
https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/linux/openvswitch_firewall/firewall.py#L548
  
https://github.com/openstack/neutron/blob/79445f12be3a9ca892672fe0e016336ef60877a2/neutron/agent/linux/openvswitch_firewall/firewall.py#L628

  Which eventually ends up in
  
https://github.com/openstack/neutron/blob/master/neutron/plugins/ml2/drivers/openvswitch/agent/openflow/native/ofswitch.

[Yahoo-eng-team] [Bug 1943029] Re: [OVN][FT] During the OvnNorthd stop, the unix file could not exist

2021-09-24 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/807862
Committed: 
https://opendev.org/openstack/neutron/commit/1762ed883447fefbf08e11c309fc1f098374ea53
Submitter: "Zuul (22348)"
Branch:master

commit 1762ed883447fefbf08e11c309fc1f098374ea53
Author: Rodolfo Alonso Hernandez 
Date:   Wed Sep 8 11:46:58 2021 +

[OVN][FT] Check UNIX socket file before using it

When stopping "OvnNorthd", first check if the UNIX socket file is
still present before calling "ovs-appctl" command.

Closes-Bug: #1943029
Change-Id: Id70df8d28a258108acf88f36b2fb59b5df3a0857


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1943029

Title:
  [OVN][FT] During the OvnNorthd stop, the unix file could not exist

Status in neutron:
  Fix Released

Bug description:
  During the OVN functional tests, when the "OvnNorthd" is stopped at
  the end of the test, the UNIX ctl file could have been already
  deleted. Check first before executing the "ovs-appctl" command.

  E.g.:
  Running command: ['ovs-appctl', '-t', '/tmp/tmpo321snqk/ovnnb_db.ctl', 'exit']
  Exit code: 1; Cmd: ['ovs-appctl', '-t', '/tmp/tmpo321snqk/ovnnb_db.ctl', 
'exit']; Stdin: ; Stdout: ; Stderr: 
2021-09-08T11:18:44Z|1|unixctl|WARN|failed to connect to 
/tmp/tmpo321snqk/ovnnb_db.ctl
  ovs-appctl: cannot connect to "/tmp/tmpo321snqk/ovnnb_db.ctl" (No such file 
or directory)

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1943029/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1943977] Re: os_bricks fails to rerty "iscsiadm -m session" when iser_use_multipath and iscsi_use_multipath are set in nova.conf

2021-09-24 Thread Lee Yarwood
** Project changed: nova => os-brick

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1943977

Title:
  os_bricks fails to rerty "iscsiadm -m session" when iser_use_multipath
  and iscsi_use_multipath are set in nova.conf

Status in os-brick:
  New

Bug description:
  When attempting to live migrate and VM to a new compute node it fails
  and remains on the original compute node.

  Reviewing the logs I can see the iscsi session is currently not
  connected and needs to be brought up but it then fails with "ERROR
  oslo_messaging.rpc.server TargetPortalNotFound: Unable to find target
  portal 1.1.1.1:3260".

  With "iser_use_multipath" and "iscsi_use_multipath" set false in
  nova.conf I can see the initial os_bricks fails with "No Active
  sessions" as iscsid has yet to bring up the session but the second try
  from os_bricks then succeeds as by this time iscsid has brought the
  session up.

  With "iser_use_multipath" and "iscsi_use_multipath" set true in
  nova.conf I can see the initial os_bricks fails with "No Active
  sessions" as iscsid has yet to bring up the session but no second
  attempt from os_bricks leads to the "TargetPortalNotFound".

  I'm running "os_brick-2.5.10". should os_bricks retry when using
  multipath in nova.conf?

To manage notifications about this bug go to:
https://bugs.launchpad.net/os-brick/+bug/1943977/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 1937292] Re: All overcloud VM's powered off on hypervisor when nova_libvirt is restarted

2021-09-24 Thread Launchpad Bug Tracker
[Expired for OpenStack Compute (nova) because there has been no activity
for 60 days.]

** Changed in: nova
   Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1937292

Title:
  All overcloud VM's powered off on hypervisor when nova_libvirt is
  restarted

Status in OpenStack Compute (nova):
  Expired
Status in tripleo:
  Invalid

Bug description:
  Description:

  Using TripleO. Noted that all VM's on a Hypervisor are powered off
  during the overcloud deployment. (I only have one Hypervisor sorry, I
  can't tell you if it would happen to more than one hypervisor).

  Seems to happen when the nova_libvirt container is restarted.

  Environment:
  TripleO - Master
  # podman exec -it nova_libvirt rpm -qa | grep nova
  python3-nova-23.1.0-0.20210625160814.1f6c351.el8.noarch
  openstack-nova-compute-23.1.0-0.20210625160814.1f6c351.el8.noarch
  openstack-nova-common-23.1.0-0.20210625160814.1f6c351.el8.noarch
  openstack-nova-migration-23.1.0-0.20210625160814.1f6c351.el8.noarch
  python3-novaclient-17.5.0-0.20210601131008.f431295.el8.noarch

  Reproducer:
  At least for me:
  1. Start a VM
  2. Restart tripleo_nova_libvirt.service:
  systemctl restart tripleo_nova_libvirt.service
  3. All VM's are stopped

  Relevant logs:
  2021-07-22 16:31:05.532 3 DEBUG nova.compute.manager 
[req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf 
c4740b2aba4147adb7f101a2782003c3 - default default] [instance: 
b28cc3ae-6442-40cf-9d66-9d4938a567c7] No wait
  ing events found dispatching 
network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb pop_instance_event 
/usr/lib/python3.6/site-packages/nova/compute/manager.py:319
  2021-07-22 16:31:05.532 3 WARNING nova.compute.manager 
[req-19a38d0b-e019-472b-95c4-03c796040767 d2ab1d5792604ba094af82d7447e88cf 
c4740b2aba4147adb7f101a2782003c3 - default default] [instance: 
b28cc3ae-6442-40cf-9d66-9d4938a567c7] Recei
  ved unexpected event network-vif-plugged-d9b29fef-cd87-41db-ba79-8b8c65b74efb 
for instance with vm_state active and task_state None.
  2021-07-22 16:31:30.583 3 DEBUG nova.compute.manager 
[req-7be814ae-0e3d-4631-8a4c-348ead46c213 - - - - -] Triggering sync for uuid 
b28cc3ae-6442-40cf-9d66-9d4938a567c7 _sync_power_states 
/usr/lib/python3.6/site-packages/nova/compute/man
  ager.py:9695 
  2021-07-22 16:31:30.589 3 DEBUG oslo_concurrency.lockutils [-] Lock 
"b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by 
"nova.compute.manager.ComputeManager._sync_power_states.._sync..query_driver_power_state_and_sync"
 ::
   waited 0.000s inner 
/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
  2021-07-22 16:31:30.746 3 INFO nova.compute.manager [-] [instance: 
b28cc3ae-6442-40cf-9d66-9d4938a567c7] During _sync_instance_power_state the DB 
power_state (1) does not match the vm_power_state from the hypervisor (4). 
Updating power_
  state in the DB to match the hypervisor. 
  2021-07-22 16:31:30.930 3 WARNING nova.compute.manager [-] [instance: 
b28cc3ae-6442-40cf-9d66-9d4938a567c7] Instance shutdown by itself. Calling the 
stop API. Current vm_state: active, current task_state: None, original DB 
power_state:
  1, current VM power_state: 4   
  2021-07-22 16:31:30.931 3 DEBUG nova.compute.api [-] [instance: 
b28cc3ae-6442-40cf-9d66-9d4938a567c7] Going to try to stop instance force_stop 
/usr/lib/python3.6/site-packages/nova/compute/api.py:2584
  2021-07-22 16:31:31.135 3 DEBUG oslo_concurrency.lockutils [-] Lock 
"b28cc3ae-6442-40cf-9d66-9d4938a567c7" released by 
"nova.compute.manager.ComputeManager._sync_power_states.._sync..query_driver_power_state_and_sync"
 ::
   held 0.547s inner 
/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:371
  2021-07-22 16:31:31.161 3 DEBUG oslo_concurrency.lockutils 
[req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] Lock 
"b28cc3ae-6442-40cf-9d66-9d4938a567c7" acquired by 
"nova.compute.manager.ComputeManager.stop_instance..do_stop_
  instance" :: waited 0.000s inner 
/usr/lib/python3.6/site-packages/oslo_concurrency/lockutils.py:359
  2021-07-22 16:31:31.162 3 DEBUG nova.compute.manager 
[req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: 
b28cc3ae-6442-40cf-9d66-9d4938a567c7] Checking state _get_power_state 
/usr/lib/python3.6/site-packages/nova/compute/man
  ager.py:1561 
  2021-07-22 16:31:31.165 3 DEBUG nova.compute.manager 
[req-a87509b3-9674-49df-ad1f-9f8967871e10 - - - - -] [instance: 
b28cc3ae-6442-40cf-9d66-9d4938a567c7] Stopping instance; current vm_state: 
active, current task_state: powering-off, cu
  rrent DB power_state: 4, current VM power_state: 4 do_stop_instance 
/usr/lib/python3.6/site-packages/nova/compute/manager.py:3095
  2021-07-22 16:31:31.166 3 INFO nova.compute.manager 
[req-a8750