[Yahoo-eng-team] [Bug 2086691] [NEW] nova-novncproxy is printing symbols to journal as I open more than 1 VNC console

2024-11-05 Thread Mustafa Kamaal Ahmed
Public bug reported:

Description
===

I am getting this kind of output in the journal. As I'm using the VNC,
the journal is filling up with these logs and consuming space filled
with unnecessary symbols.

Logs & Configs
==

Here's a snippet of it:
Nov 05 08:10:39 226host1 nova-novncproxy[2456742]: <}<}<{<}>{<
Nov 05 08:10:39 226host1 nova-novncproxy[2456744]: <
Nov 05 08:10:39 226host1 nova-novncproxy[2456742]: <}>>}>
Nov 05 08:10:39 226host1 nova-novncproxy[2456744]: <<
Nov 05 08:10:39 226host1 nova-novncproxy[2456742]: <{<<{<}>}>}>{<
Nov 05 08:10:39 226host1 nova-novncproxy[2456744]: <<{<
Nov 05 08:10:40 226host1 nova-novncproxy[2456742]: <<}>}>}>}>}>}>{<
Nov 05 08:10:40 226host1 nova-novncproxy[2456744]: <>{<
Nov 05 08:10:40 226host1 nova-novncproxy[2456742]: <>
Nov 05 08:10:41 226host1 nova-novncproxy[2456744]: <}>}>>{<{<{<}}>>}>}>}>
Nov 05 08:10:52 226host1 nova-novncproxy[2456742]: 
<>{<}>{<}>}>}>}>{<}>{<}>{<}>}>}>{<{<}>{<}>{<}>{<}>{<}}>>}>}>{<}>{<}>{<}>}>{<{<}>{<}>{<}>{<}>{<}>{<}>{<}>{}<>}>}>{<}>}>{<{<}>{<}>{<}>}>}>}>{<}>{<}>{<}>{<}>{<}>}>}>}>}>}>}>}>}>}>{<}>}>}>{{<<{{<<{{<<}>}}>}>}>}>}>>}>}>}>}>}>}}>>}>}>{<}>{<}>}>
Nov 05 08:11:02 226host1 nova-novncproxy[2456744]: 
<>}>}>}>{<}>{<}>{<{<}>{<}>{<{{<<{{<<{{<<}>}>}>{<}>{<}>{<}>{<}>{<}>}>{<{<}>{<}>{<}>{<}>{<}>}>}>}>}>}>{<{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>}>}>}>}>}>}>}>}>}>}>}>}}>>}>}>}>}>}>}>}>}>
Nov 05 08:11:02 226host1 nova-novncproxy[2456742]: <<
Nov 05 08:11:02 226host1 nova-novncproxy[2456744]: <<
Nov 05 08:11:02 226host1 nova-novncproxy[2456742]: <>
Nov 05 08:12:04 226host1 nova-novncproxy[2456744]: 
<>}>}>}>{<}>{<}>{<}>{<}>{<{<}>{<}>{<}>{<}>}>}>}>{<}>{<}>}>}}>>}>}>}>}>}>
Nov 05 08:13:04 226host1 nova-novncproxy[2456742]: <<}>
Nov 05 08:13:06 226host1 nova-novncproxy[2456744]: <<}>
Nov 05 08:14:05 226host1 nova-novncproxy[2456742]: <<}>{<}>
Nov 05 08:15:03 226host1 nova-novncproxy[2456744]: <<}>
Nov 05 08:15:05 226host1 nova-novncproxy[2456742]: <<}>
Nov 05 08:16:04 226host1 nova-novncproxy[2456744]: <<}>
Nov 05 08:16:06 226host1 nova-novncproxy[2456742]: <<}>
Nov 05 08:17:04 226host1 nova-novncproxy[2456744]: <<}>{<}>
Nov 05 08:17:20 226host1 nova-novncproxy[2456742]: <<}>
Nov 05 08:23:31 226host1 nova-novncproxy[2456742]: 
<<}>}>}>}>}>{<}>{<}>{<}>}>}>}>{<}>{<}>{<}>}>{<}>}>{<}>}>}>{<}>{<}>{<}>{<}>{<}}>>}>}>}>}>{<}>{<}>{<}>{<}}>>}>}>{<}>{<}>}>}>}>}>}>}>}>}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}}>>}>}>}>{<}>}>}>}>}>{<}>{<}>}>}>}>}}>>}>}>}>}>}>}>}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>{<}>}>}>}>}>}>}>}>}>}>}>}>}>}>{<}>}>}>}>}>{<}>{<}>}>{<}>{<}>{<}>

Environment
===
1. Openstack Version:

$ rpm -qa | grep nova
openstack-nova-common-24.1.1-1.el8.noarch
openstack-nova-conductor-24.1.1-1.el8.noarch
openstack-nova-scheduler-24.1.1-1.el8.noarch
python3-novaclient-17.6.0-1.el8.noarch
openstack-nova-novncproxy-24.1.1-1.el8.noarch
openstack-nova-compute-24.1.1-1.el8.noarch
python3-nova-24.1.1-1.el8.noarch
openstack-nova-api-24.1.1-1.el8.noarch

2. Hypervisor: Libvirt + KVM

$ libvirtd --version
libvirtd (libvirt) 8.0.0

$ rpm -qa | grep kvm
qemu-kvm-ui-spice-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-core-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-curl-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-hw-usbredir-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-common-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-docs-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-rbd-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-ssh-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
libvirt-daemon-kvm-8.0.0-10.module_el8.7.0+1218+f626c2ff.x86_64
qemu-kvm-block-gluster-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-block-iscsi-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64
qemu-kvm-ui-opengl-6.2.0-20.module_el8.7.0+1218+f626c2ff.1.x86_64


3. Storage: SAN storage connected through iscsi and using LVM to manage it

$ lvm version
  LVM version: 2.03.14(2)-RHEL8 (2021-10-20)
  Library version: 1.02.181-RHEL8 (2021-10-20)
  Driver version:  4.46.0

$ iscsiadm --version
iscsiadm version 6.2.1.4-1


Steps to reproduce
==

- Launch a openstack on CentOS Stream 8.
- Upload a custom Windows Image in .raw format with spice guest tools (latest) 
and cloudinit.
- Create a VM and open the VM console from the Horizon dashboard.

Expected result
===
No random output in the journal

Actual result
=

Seeing random outout in the journal logs by running "journalctl -u
openstack-nova-novncproxy.service -f"


I figured it's only the STDOUT since when I mentioned StandardOutput=null in 
the service file, it stopped.

** Affects: nova
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2086691

Title:
  no

[Yahoo-eng-team] [Bug 2072154] Re: Port status goes BUILD when migrating non-sriov instance in sriov setting.

2024-11-05 Thread Edward Hope-Morley
** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/caracal
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/yoga
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/epoxy
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/zed
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/bobcat
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/antelope
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/dalmation
   Importance: Undecided
   Status: New

** No longer affects: neutron (Ubuntu Focal)

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2072154

Title:
  Port status goes BUILD when migrating non-sriov instance in sriov
  setting.

Status in Ubuntu Cloud Archive:
  New
Status in Ubuntu Cloud Archive antelope series:
  New
Status in Ubuntu Cloud Archive bobcat series:
  New
Status in Ubuntu Cloud Archive caracal series:
  New
Status in Ubuntu Cloud Archive dalmation series:
  New
Status in Ubuntu Cloud Archive epoxy series:
  New
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  New
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  In Progress
Status in neutron source package in Jammy:
  In Progress
Status in neutron source package in Noble:
  In Progress
Status in neutron source package in Oracular:
  In Progress

Bug description:
  [ Impact ]

  Port status goes BUILD when migrating non-sriov instance in sriov
  setting.

  [ Test Plan ]

  1. Deploy OpenStack using Juju & Charms ( upstream also has the same code )
  2. Enable SRIOV
  3. create a vm without sriov nic. (test)
  4. migrate it to another host
  - openstack server migrate --live-migration --os-compute-api-version 2.30 
--host node-04.maas test
  5. check port status
  - https://paste.ubuntu.com/p/RKGnP76MvB/

  [ Where problems could occur ]

  this patch is related to sriov agent. it adds checking if port is
  sriov or not. so it could be possible that sriov port can be handled
  inproperly.

  [ Other Info ]

  nova-compute has neutron-sriov-nic-agent and neutron-ovn-metadata-
  agent

  So far, I've checked that

  ovn_monitor change it to ACTIVE but sriov-nic-agent change it back to
  BUILD by calling _get_new_status

  ./plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py
  binding_activate
  - get_device_details_from_port_id
  - get_device_details
  - _get_new_status < this makes status BUILD.

  so as running order is not fixed, sometimes it goes ACTIVE, sometimes
  BUILD.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2072154/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2085543] Re: [OVN] Port device_owner is not set in the Trunk subport

2024-11-05 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/933836
Committed: 
https://opendev.org/openstack/neutron/commit/c0bdb0c8a33286acb4d44ad865f309fc79b6
Submitter: "Zuul (22348)"
Branch:master

commit c0bdb0c8a33286acb4d44ad865f309fc79b6
Author: Rodolfo Alonso Hernandez 
Date:   Wed Oct 30 18:08:15 2024 +

[OVN] Check LSP.up status before setting the port host info

Before executing updating the Logical_Swith_Port host information, it
is needed to check the current status of the port. If it doesn't match
with the event calling this update, the host information is not updated.

Closes-Bug: #2085543
Change-Id: I92afb190375caf27c815f9fe1cb627e87c49d4ca


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2085543

Title:
  [OVN] Port device_owner is not set in the Trunk subport

Status in neutron:
  Fix Released

Bug description:
  This issue was found in the test
  
``neutron_tempest_plugin.scenario.test_trunk.TrunkTest.test_trunk_subport_lifecycle``.
  The subport "463f7c45-3c06-4340-a509-e96b2faae525" [1] is created and
  then assigned to a trunk. When the port is assigned to the trunk, the
  device owner is not updated to "trunk:subport"

  UPDATE: the description is not accurate. The problem is that the port
  deactivation is executed when the port activation didn't finish. When
  the port is activated (the VM starts and binds the parent port and
  subports), the method ``set_port_status_up`` is called from the event
  ``LogicalSwitchPortUpdateUpEvent``. The problem is that the event
  actions are executed in a loop thread
  (``RowEventHandler.notify_loop``) that is not synchronous with the API
  call. The API call exists before the ``set_port_status_up`` finishes.

  The tempest test checks that the subport is ACTIVE and proceeds to
  unbind it (remove from the trunk). That removes the port device_owner
  and binding host. That's a problem because the method
  ``set_port_status_up`` is still being executed and needs the "older"
  values (device_owner="trunk:subport").

  In a nutshell, this is a race condition because the OVN event
  processing is done asynchronously to the API call.

  Logs:
  
https://f918f4eca95000e5dd6c-6bcda3a769a6c31ee12f465dd60bb9a2.ssl.cf5.rackcdn.com/933210/3/check/neutron-
  tempest-plugin-ovn-10/43a1557/testr_results.html

  [1]https://paste.opendev.org/show/bzmtiytDBKKkgi4IgZ15/

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2085543/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2086675] [NEW] Suspected performance regression for RBD back end linked to location sorting

2024-11-05 Thread Andrew Bonney
Public bug reported:

Hi,
For the past few releases (Caracal, Bobcat, maybe earlier) we have noticed that 
listing images is particularly slow. I've finally had some time to dig into 
this, and I believe I've tracked it to 
https://review.opendev.org/c/openstack/glance/+/886811

We're running the RBD back end primarily, but also have HTTP and Cinder
listed in 'enabled_backends', meaning that sort_image_locations executes
in full. It appears that when this function executes for the RBD back
end, it causes a connection to be opened to the back end. When doing a
full image list operation, this happens once for every image in the list
(the connection is not re-used). This appears to carry a 20-30ms time
penalty per image. As such, for any reasonable set of images the
response ends up taking several seconds.

In our case, images are unlikely to be held in more than one back end at
a time, and I noted that adding a length check to the locations list in
https://github.com/openstack/glance/blob/master/glance/common/utils.py#L718
so that the sorting doesn't occur when the list has just one element
resolves the performance issue entirely.

Whilst a length check is a workaround, does the sorting operation
actually require connections to be opened to the RBD back end? If they
are required, could the connections at least be re-used to avoid this
time penalty growing linearly with the number of images held by Glance?

Thanks

** Affects: glance
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to Glance.
https://bugs.launchpad.net/bugs/2086675

Title:
  Suspected performance regression for RBD back end linked to location
  sorting

Status in Glance:
  New

Bug description:
  Hi,
  For the past few releases (Caracal, Bobcat, maybe earlier) we have noticed 
that listing images is particularly slow. I've finally had some time to dig 
into this, and I believe I've tracked it to 
https://review.opendev.org/c/openstack/glance/+/886811

  We're running the RBD back end primarily, but also have HTTP and
  Cinder listed in 'enabled_backends', meaning that sort_image_locations
  executes in full. It appears that when this function executes for the
  RBD back end, it causes a connection to be opened to the back end.
  When doing a full image list operation, this happens once for every
  image in the list (the connection is not re-used). This appears to
  carry a 20-30ms time penalty per image. As such, for any reasonable
  set of images the response ends up taking several seconds.

  In our case, images are unlikely to be held in more than one back end
  at a time, and I noted that adding a length check to the locations
  list in
  https://github.com/openstack/glance/blob/master/glance/common/utils.py#L718
  so that the sorting doesn't occur when the list has just one element
  resolves the performance issue entirely.

  Whilst a length check is a workaround, does the sorting operation
  actually require connections to be opened to the RBD back end? If they
  are required, could the connections at least be re-used to avoid this
  time penalty growing linearly with the number of images held by
  Glance?

  Thanks

To manage notifications about this bug go to:
https://bugs.launchpad.net/glance/+bug/2086675/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2072154] Re: Port status goes BUILD when migrating non-sriov instance in sriov setting.

2024-11-05 Thread Edward Hope-Morley
** Changed in: cloud-archive/epoxy
   Status: New => Fix Released

** Changed in: cloud-archive/dalmation
   Status: New => Fix Released

** Changed in: neutron (Ubuntu Oracular)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2072154

Title:
  Port status goes BUILD when migrating non-sriov instance in sriov
  setting.

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive antelope series:
  New
Status in Ubuntu Cloud Archive bobcat series:
  New
Status in Ubuntu Cloud Archive caracal series:
  New
Status in Ubuntu Cloud Archive dalmation series:
  Fix Released
Status in Ubuntu Cloud Archive epoxy series:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  New
Status in neutron:
  Fix Released
Status in neutron package in Ubuntu:
  In Progress
Status in neutron source package in Jammy:
  In Progress
Status in neutron source package in Noble:
  In Progress
Status in neutron source package in Oracular:
  Fix Released

Bug description:
  [ Impact ]

  Port status goes BUILD when migrating non-sriov instance in sriov
  setting.

  [ Test Plan ]

  1. Deploy OpenStack using Juju & Charms ( upstream also has the same code )
  2. Enable SRIOV
  3. create a vm without sriov nic. (test)
  4. migrate it to another host
  - openstack server migrate --live-migration --os-compute-api-version 2.30 
--host node-04.maas test
  5. check port status
  - https://paste.ubuntu.com/p/RKGnP76MvB/

  [ Where problems could occur ]

  this patch is related to sriov agent. it adds checking if port is
  sriov or not. so it could be possible that sriov port can be handled
  inproperly.

  [ Other Info ]

  nova-compute has neutron-sriov-nic-agent and neutron-ovn-metadata-
  agent

  So far, I've checked that

  ovn_monitor change it to ACTIVE but sriov-nic-agent change it back to
  BUILD by calling _get_new_status

  ./plugins/ml2/drivers/mech_sriov/agent/sriov_nic_agent.py
  binding_activate
  - get_device_details_from_port_id
  - get_device_details
  - _get_new_status < this makes status BUILD.

  so as running order is not fixed, sometimes it goes ACTIVE, sometimes
  BUILD.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2072154/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2085946] Re: [OVN] Revision number registers must be filtered by resource ID and type

2024-11-05 Thread OpenStack Infra
Reviewed:  https://review.opendev.org/c/openstack/neutron/+/933752
Committed: 
https://opendev.org/openstack/neutron/commit/a298a37fe7ee41d25db02fdde36e134b01ef5d9a
Submitter: "Zuul (22348)"
Branch:master

commit a298a37fe7ee41d25db02fdde36e134b01ef5d9a
Author: Rodolfo Alonso Hernandez 
Date:   Wed Oct 30 00:58:16 2024 +

[OVN] Fix the revision number retrieval method

The "ovn_revision_numbers" table has a unique constraint that is a
combination of the "resource_uuid" and the "resource_type". There is
a case where the resource_uuid can be the same for two registers.
A router interface will create a single Neutron DB register ("ports")
but it will require two OVN DB registers ("Logical_Switch_Port" and
"Logical_Router_Ports"). In this case is needed to define the
"resource_type" when retrieving the revision number.

The exception "RevisionNumberNotDefined" will be thrown if only the
"resource_uuid" is provided in the related case.

Closes-Bug: #2085946
Change-Id: I12079de78773f7409503392d4791848aea90cb7b


** Changed in: neutron
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2085946

Title:
  [OVN] Revision number registers must be filtered by resource ID and
  type

Status in neutron:
  Fix Released

Bug description:
  The OVN revision numbers have a multicolumn index: (resource_uuid,
  resource_type) [1]. In particular because of the Neutron ports that
  belong to a router. A router interface is a single Neutron register
  ("ports"). But in OVN two registers are created:
  "Logical_Switch_Ports" and "Logical_Router_Ports".

  When retrieving a register "ovn_revision_numbers" from the Neutron
  database, it is needed to provide both the resource_uuid and the
  resource_type [2].

  
[1]https://github.com/openstack/neutron/blob/febdfb5d8b1cf261c13b40e330d91a5bcb6c7642/neutron/db/models/ovn.py#L41-L46
  
[2]https://github.com/openstack/neutron/blob/febdfb5d8b1cf261c13b40e330d91a5bcb6c7642/neutron/db/ovn_revision_numbers_db.py#L159-L167

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2085946/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2017748] Re: [SRU] OVN: ovnmeta namespaces missing during scalability test causing DHCP issues

2024-11-05 Thread Edward Hope-Morley
Deleted debdiff previously submitted as per @haleyb since they need
resubmitting now that the upstream backports have been done.

** Changed in: cloud-archive/yoga
   Status: Fix Committed => New

** Also affects: neutron (Ubuntu Noble)
   Importance: Undecided
   Status: New

** Also affects: neutron (Ubuntu Oracular)
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/epoxy
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/zed
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/bobcat
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/dalmation
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/antelope
   Importance: Undecided
   Status: New

** Also affects: cloud-archive/caracal
   Importance: Undecided
   Status: New

** Patch removed: "focal_yoga.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/2017748/+attachment/5759481/+files/focal_yoga.debdiff

** Patch removed: "jammy.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/2017748/+attachment/5778137/+files/jammy.debdiff

** Changed in: cloud-archive/yoga
 Assignee: Hua Zhang (zhhuabj) => (unassigned)

** Changed in: cloud-archive/epoxy
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2017748

Title:
  [SRU] OVN:  ovnmeta namespaces missing during scalability test causing
  DHCP issues

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive antelope series:
  New
Status in Ubuntu Cloud Archive bobcat series:
  New
Status in Ubuntu Cloud Archive caracal series:
  New
Status in Ubuntu Cloud Archive dalmation series:
  New
Status in Ubuntu Cloud Archive epoxy series:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  New
Status in neutron:
  New
Status in neutron ussuri series:
  Fix Released
Status in neutron victoria series:
  New
Status in neutron wallaby series:
  New
Status in neutron xena series:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  New
Status in neutron source package in Jammy:
  New
Status in neutron source package in Noble:
  New
Status in neutron source package in Oracular:
  New

Bug description:
  [Impact]

  ovnmeta- namespaces are missing intermittently then can't reach to VMs

  [Test Case]
  Not able to reproduce this easily, so I run charmed-openstack-tester, the 
result is below:

  ==
 
  Totals
 
  ==
 
  Ran: 469 tests in 4273.6309 sec.  
 
   - Passed: 398
 
   - Skipped: 69
 
   - Expected Fail: 0   
 
   - Unexpected Success: 0  
 
   - Failed: 2  
 
  Sum of execute time for each test: 4387.2727 sec. 

  2 failed tests
  (tempest.api.object_storage.test_account_quotas.AccountQuotasTest and
  
octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest)
  is not related to the fix

  [Where problems could occur]
  This patches are related to ovn metadata agent in compute.
  VM's connectivity can possibly be affected by this patch when ovn is used.
  Biding port to datapath could be affected.

  [Others]

  == ORIGINAL DESCRIPTION ==

  Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650

  During a scalability test it was noted that a few VMs where having
  issues being pinged (2 out of ~5000 VMs in the test conducted). After
  some investigation it was found that the VMs in question did not
  receive a DHCP lease:

  udhcpc: no lease, failing
  FAIL
  checking http://169.254.169.254/2009-04-04/instance-id
  failed 1/20: up 181.90. request failed

  And the ovnmeta- namespaces for the networks that the VMs was booting
  from were missing. Looking into the ovn-metadata-agent.log:

  2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent
  [-] There is no metadata port for network
  9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses
  configured, tearing the namespace down if needed _get_provision_params
  /usr/lib/python3.9/site-
  packages/neutron/agent/ovn/metadata/agent.py:495

  Apparently, when the system is under stress (scalability tests) the

[Yahoo-eng-team] [Bug 2017748] Re: [SRU] OVN: ovnmeta namespaces missing during scalability test causing DHCP issues

2024-11-05 Thread Brian Haley
** Changed in: cloud-archive/dalmation
   Status: New => Fix Released

** Changed in: cloud-archive/caracal
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2017748

Title:
  [SRU] OVN:  ovnmeta namespaces missing during scalability test causing
  DHCP issues

Status in Ubuntu Cloud Archive:
  Fix Released
Status in Ubuntu Cloud Archive antelope series:
  New
Status in Ubuntu Cloud Archive bobcat series:
  New
Status in Ubuntu Cloud Archive caracal series:
  Fix Released
Status in Ubuntu Cloud Archive dalmation series:
  Fix Released
Status in Ubuntu Cloud Archive epoxy series:
  Fix Released
Status in Ubuntu Cloud Archive yoga series:
  New
Status in Ubuntu Cloud Archive zed series:
  New
Status in neutron:
  New
Status in neutron ussuri series:
  Fix Released
Status in neutron victoria series:
  New
Status in neutron wallaby series:
  New
Status in neutron xena series:
  New
Status in neutron package in Ubuntu:
  Fix Released
Status in neutron source package in Focal:
  New
Status in neutron source package in Jammy:
  New
Status in neutron source package in Noble:
  New
Status in neutron source package in Oracular:
  New

Bug description:
  [Impact]

  ovnmeta- namespaces are missing intermittently then can't reach to VMs

  [Test Case]
  Not able to reproduce this easily, so I run charmed-openstack-tester, the 
result is below:

  ==
 
  Totals
 
  ==
 
  Ran: 469 tests in 4273.6309 sec.  
 
   - Passed: 398
 
   - Skipped: 69
 
   - Expected Fail: 0   
 
   - Unexpected Success: 0  
 
   - Failed: 2  
 
  Sum of execute time for each test: 4387.2727 sec. 

  2 failed tests
  (tempest.api.object_storage.test_account_quotas.AccountQuotasTest and
  
octavia_tempest_plugin.tests.scenario.v2.test_traffic_ops.TrafficOperationsScenarioTest)
  is not related to the fix

  [Where problems could occur]
  This patches are related to ovn metadata agent in compute.
  VM's connectivity can possibly be affected by this patch when ovn is used.
  Biding port to datapath could be affected.

  [Others]

  == ORIGINAL DESCRIPTION ==

  Reported at: https://bugzilla.redhat.com/show_bug.cgi?id=2187650

  During a scalability test it was noted that a few VMs where having
  issues being pinged (2 out of ~5000 VMs in the test conducted). After
  some investigation it was found that the VMs in question did not
  receive a DHCP lease:

  udhcpc: no lease, failing
  FAIL
  checking http://169.254.169.254/2009-04-04/instance-id
  failed 1/20: up 181.90. request failed

  And the ovnmeta- namespaces for the networks that the VMs was booting
  from were missing. Looking into the ovn-metadata-agent.log:

  2023-04-18 06:56:09.864 353474 DEBUG neutron.agent.ovn.metadata.agent
  [-] There is no metadata port for network
  9029c393-5c40-4bf2-beec-27413417eafa or it has no MAC or IP addresses
  configured, tearing the namespace down if needed _get_provision_params
  /usr/lib/python3.9/site-
  packages/neutron/agent/ovn/metadata/agent.py:495

  Apparently, when the system is under stress (scalability tests) there
  are some edge cases where the metadata port information has not yet
  being propagated by OVN to the Southbound database and when the
  PortBindingChassisEvent event is being handled and try to find either
  the metadata port of the IP information on it (which is updated by
  ML2/OVN during subnet creation) it can not be found and fails silently
  with the error shown above.

  Note that, running the same tests but with less concurrency did not
  trigger this issue. So only happens when the system is overloaded.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/2017748/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2086740] [NEW] Deadlock when metadata agent starts

2024-11-05 Thread Jakub Libosvar
Public bug reported:

There is a small chance that if a port binding update event occurs right
after SB IDL was instantiated and before the post_fork event is set, the
event match function calls to sb_idl attribute that waits indefinitely.

** Affects: neutron
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2086740

Title:
  Deadlock when metadata agent starts

Status in neutron:
  New

Bug description:
  There is a small chance that if a port binding update event occurs
  right after SB IDL was instantiated and before the post_fork event is
  set, the event match function calls to sb_idl attribute that waits
  indefinitely.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2086740/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp


[Yahoo-eng-team] [Bug 2086750] [NEW] VlanTransparencyTest.test_vlan_transparent_port_sec_disabled failed to create a server, n-api returns 500, "nova.exception.NovaException: Failed to access port" re

2024-11-05 Thread Ihar Hrachyshka
Public bug reported:

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_92f/931993/3/check/neutron-
tempest-plugin-ovn/92fc4e3/testr_results.html

2024-11-06 01:24:07,246 90683 INFO [tempest.lib.common.rest_client] Request 
(VlanTransparencyTest:test_vlan_transparent_port_sec_disabled): 500 POST 
https://158.69.77.94/compute/v2.1/servers 300.064s
2024-11-06 01:24:07,247 90683 DEBUG[tempest.lib.common.rest_client] Request 
- Headers: {'Content-Type': 'application/json', 'Accept': 'application/json', 
'X-Auth-Token': ''}
Body: {"server": {"flavorRef": "d6da27ed-2673-4a67-9d11-2163c67f61d3", 
"imageRef": "89db8c48-81a7-4dd2-9192-c41d4cddabbf", "key_name": 
"tempest-VlanTransparencyTest-1374679314", "networks": [{"port": 
"a68e5f79-48b1-41bf-9de3-e1e713380ef1"}], "name": 
"server-tempest-VlanTransparencyTest-1374679314-0", "security_groups": 
[{"name": "default"}]}}
Response - Headers: {'date': 'Wed, 06 Nov 2024 01:19:07 GMT', 'server': 
'Apache/2.4.52 (Ubuntu)', 'content-length': '610', 'content-type': 'text/html; 
charset=iso-8859-1', 'connection': 'close', 'status': '500', 
'content-location': 'https://158.69.77.94/compute/v2.1/servers'}
Body: b'\n\n500 Internal Server 
Error\n\nInternal Server Error\nThe server 
encountered an internal error or\nmisconfiguration and was unable to 
complete\nyour request.\nPlease contact the server administrator at \n 
webmaster@localhost to inform them of the time this error occurred,\n and the 
actions you performed just before this error.\nMore information about 
this error may be available\nin the server error 
log.\n\nApache/2.4.52 (Ubuntu) Server at 158.69.77.94 Port 
80\n\n'
}}}

Traceback (most recent call last):
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_vlan_transparency.py",
 line 178, in test_vlan_transparent_port_sec_disabled
self._test_basic_vlan_transparency_connectivity(
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_vlan_transparency.py",
 line 133, in _test_basic_vlan_transparency_connectivity
vms.append(self._create_port_and_server(
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/test_vlan_transparency.py",
 line 86, in _create_port_and_server
return self.create_server(flavor_ref=self.flavor_ref,
  File 
"/opt/stack/tempest/.tox/tempest/lib/python3.10/site-packages/neutron_tempest_plugin/scenario/base.py",
 line 135, in create_server
server = client.create_server(
  File "/opt/stack/tempest/tempest/lib/services/compute/servers_client.py", 
line 119, in create_server
resp, body = self.post('servers', post_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 314, in post
resp_header, resp_body = self.request(
  File 
"/opt/stack/tempest/tempest/lib/services/compute/base_compute_client.py", line 
47, in request
resp, resp_body = super(BaseComputeClient, self).request(
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 762, in 
request
self._error_checker(resp, resp_body)
  File "/opt/stack/tempest/tempest/lib/common/rest_client.py", line 856, in 
_error_checker
raise exceptions.UnexpectedContentType(str(resp.status),
tempest.lib.exceptions.UnexpectedContentType: Unexpected content type provided
Details: 500

In n-api log:

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_92f/931993/3/check/neutron-
tempest-plugin-ovn/92fc4e3/controller/logs/screen-n-api.txt

Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: DEBUG 
neutronclient.v2_0.client [None req-eb975fa0-ecd2-44dc-968c-b905124050d0 
tempest-VlanTransparencyTest-258921115 
tempest-VlanTransparencyTest-258921115-project-member] Error message: 
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: 
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: 500 
Internal Server Error
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: 
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: Internal 
Server Error
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: The 
server encountered an internal error or
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: 
misconfiguration and was unable to complete
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: your 
request.
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: Please 
contact the server administrator at
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]:  
webmaster@localhost to inform them of the time this error occurred,
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]:  and the 
actions you performed just before this error.
Nov 06 01:24:07.353465 np0038980715 devstack@n-api.service[59370]: More 
information about this error may be a

[Yahoo-eng-team] [Bug 2077897] Re: Cannot deploy Ubuntu 24.10

2024-11-05 Thread Frank Heimes
** Changed in: ubuntu-power-systems
   Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/2077897

Title:
  Cannot deploy Ubuntu 24.10

Status in cloud-init:
  Unknown
Status in MAAS:
  Invalid
Status in The Ubuntu-power-systems project:
  Fix Released

Bug description:
  Failed deployment when trying to install a POWER9 and POWER10 LPAR
  with Ubuntu Oracular Oriole 24.10:

  The following error after Rebooting step:

  ```
  [  OK  ] Finished snapd.seeded.service - Wait until snapd is fully seeded.
  [  OK  ] Reached target multi-user.target - Multi-User System.
  [  OK  ] Reached target graphical.target - Graphical Interface.
   Starting cloud-final.service - Cloud-init: Final Stage...
   Starting systemd-update-utmp-runle…- Record Runlevel Change in 
UTMP...
  [  OK  ] Finished systemd-update-utmp-runle…e - Record Runlevel Change in 
UTMP.
  [7.734835] cloud-init[1043]: 2024-08-26 13:24:42,056 - util.py[WARNING]: 
Can not apply stage final, no datasource found! Likely bad things to come!
  [7.735125] cloud-init[1043]: Can not apply stage final, no datasource 
found! Likely bad things to come!
  [7.735188] cloud-init[1043]: 

  [7.735314] cloud-init[1043]: Traceback (most recent call last):
  [7.735432] cloud-init[1043]:   File 
"/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 630, in 
main_modules
  [7.735550] cloud-init[1043]: init.fetch(existing="trust")
  [7.735706] cloud-init[1043]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 552, in fetch
  [7.735791] cloud-init[1043]: return 
self._get_data_source(existing=existing)
  [7.735843] cloud-init[1043]:

  [7.735902] cloud-init[1043]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 403, in 
_get_data_source
  [7.735961] cloud-init[1043]: raise e
  [7.736025] cloud-init[1043]:   File 
"/usr/lib/python3/dist-packages/cloudinit/stages.py", line 390, in 
_get_data_source
  [7.736135] cloud-init[1043]: ds, dsname = sources.find_source(
  [7.736227] cloud-init[1043]:  
  [7.736302] cloud-init[1043]:   File 
"/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 1063, in 
find_source
  [7.736382] cloud-init[1043]: raise DataSourceNotFoundException(msg)
  [7.736486] cloud-init[1043]: 
cloudinit.sources.DataSourceNotFoundException: Did not find any data source, 
searched classes: ()
  [7.736564] cloud-init[1043]: 

  [7.737018] sh[1789]: Completed socket interaction for boot stage final
  [FAILED] Failed to start cloud-final.service - Cloud-init: Final Stage.
  See 'systemctl status cloud-final.service' for details.
  [  OK  ] Reached target cloud-init.target - Cloud-init target.
  ```

  Able to install the same partitions with Ubuntu 22.04 and 24.04 (via MAAS). 
  And also able to boot Oracular 24.10 on the same partitions installing via 
.ISO (not via MAAS). 

  Debian-based MAAS: 3.2.11

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/2077897/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp