[Yahoo-eng-team] [Bug 1348103] [NEW] nova to neutron port notification fails in cells environment
Public bug reported: When deploying OpenStack Icehouse on Ubuntu trusty in a cells configuration the callback from neutron to nova that notifies nova when a port for an instance is ready to be used seems to be lost. This causes the spawning instance to go into an ERROR state and the following int the nova-compute.log: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1714, in _spawn block_device_info) File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2266, in spawn block_device_info) File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3681, in _create_domain_and_network raise exception.VirtualInterfaceCreateException() VirtualInterfaceCreateException: Virtual Interface creation failed Adding "vif_plugging_is_fatal = False" and "vif_plugging_timeout = 5" to the compute nodes stops the missing message from being fatal and guests can then be spawned normally and accessed over the network. This issue doesn't present itself when deploying in a non-cell configuration. I'll attatch logs from attempting to spawn a new guest (at about 07:52) with: nova boot --image precise --flavor m1.small --key_name test --nic net- id=b77ca278-6e00-4530-94fe-c946a6046acf server075238 where dc31c58f-e455-4a1a-b825-6777ccb8d3c1 is the resulting guest id nova-cells 1:2014.1.1-0ubuntu1 nova-api-ec21:2014.1.1-0ubuntu1 nova-api-os-compute 1:2014.1.1-0ubuntu1 nova-cert 1:2014.1.1-0ubuntu1 nova-common 1:2014.1.1-0ubuntu1 nova-conductor 1:2014.1.1-0ubuntu1 nova-objectstore 1:2014.1.1-0ubuntu1 nova-scheduler 1:2014.1.1-0ubuntu1 neutron-common 1:2014.1.1-0ubuntu2 neutron-plugin-ml2 1:2014.1.1-0ubuntu2 neutron-server 1:2014.1.1-0ubuntu2 neutron-plugin-openvswitch-agent 1:2014.1.1-0ubuntu2 openvswitch-common 2.0.1+git20140120-0ubuntu2 openvswitch-switch 2.0.1+git20140120-0ubuntu2 neutron-plugin-ml2 1:2014.1.1-0ubuntu2 ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1348103 Title: nova to neutron port notification fails in cells environment Status in OpenStack Compute (Nova): New Bug description: When deploying OpenStack Icehouse on Ubuntu trusty in a cells configuration the callback from neutron to nova that notifies nova when a port for an instance is ready to be used seems to be lost. This causes the spawning instance to go into an ERROR state and the following int the nova-compute.log: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1714, in _spawn block_device_info) File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 2266, in spawn block_device_info) File "/usr/lib/python2.7/dist-packages/nova/virt/libvirt/driver.py", line 3681, in _create_domain_and_network raise exception.VirtualInterfaceCreateException() VirtualInterfaceCreateException: Virtual Interface creation failed Adding "vif_plugging_is_fatal = False" and "vif_plugging_timeout = 5" to the compute nodes stops the missing message from being fatal and guests can then be spawned normally and accessed over the network. This issue doesn't present itself when deploying in a non-cell configuration. I'll attatch logs from attempting to spawn a new guest (at about 07:52) with: nova boot --image precise --flavor m1.small --key_name test --nic net-id=b77ca278-6e00-4530-94fe-c946a6046acf server075238 where dc31c58f-e455-4a1a-b825-6777ccb8d3c1 is the resulting guest id nova-cells 1:2014.1.1-0ubuntu1 nova-api-ec21:2014.1.1-0ubuntu1 nova-api-os-compute 1:2014.1.1-0ubuntu1 nova-cert 1:2014.1.1-0ubuntu1 nova-common 1:2014.1.1-0ubuntu1 nova-conductor 1:2014.1.1-0ubuntu1 nova-objectstore 1:2014.1.1-0ubuntu1 nova-scheduler 1:2014.1.1-0ubuntu1 neutron-common 1:2014.1.1-0ubuntu2 neutron-plugin-ml2 1:2014.1.1-0ubuntu2 neutron-server 1:2014.1.1-0ubuntu2
[Yahoo-eng-team] [Bug 1359805] [NEW] 'Requested operation is not valid: domain is not running' from check-tempest-dsvm-neutron-full
Public bug reported: I received the following error from the check-tempest-dsvm-neutron-full test suite after submitting a nova patch: 2014-08-21 14:11:25.059 | Captured traceback: 2014-08-21 14:11:25.059 | ~~~ 2014-08-21 14:11:25.059 | Traceback (most recent call last): 2014-08-21 14:11:25.059 | File "tempest/api/compute/servers/test_server_actions.py", line 407, in test_suspend_resume_server 2014-08-21 14:11:25.059 | self.client.wait_for_server_status(self.server_id, 'SUSPENDED') 2014-08-21 14:11:25.059 | File "tempest/services/compute/xml/servers_client.py", line 390, in wait_for_server_status 2014-08-21 14:11:25.059 | raise_on_error=raise_on_error) 2014-08-21 14:11:25.059 | File "tempest/common/waiters.py", line 77, in wait_for_server_status 2014-08-21 14:11:25.059 | server_id=server_id) 2014-08-21 14:11:25.059 | BuildErrorException: Server a29ec7be-be83-4247-b7db-49bd4727d206 failed to build and is in ERROR status 2014-08-21 14:11:25.059 | Details: {'message': 'Requested operation is not valid: domain is not running', 'code': '500', 'details': 'None', 'created': '2014-08-21T13:49:49Z'} ** Affects: neutron Importance: Undecided Status: New ** Attachment added: "check-tempest-dsvm-neutron-full-console.txt" https://bugs.launchpad.net/bugs/1359805/+attachment/4183601/+files/check-tempest-dsvm-neutron-full-console.txt -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1359805 Title: 'Requested operation is not valid: domain is not running' from check- tempest-dsvm-neutron-full Status in OpenStack Neutron (virtual network service): New Bug description: I received the following error from the check-tempest-dsvm-neutron- full test suite after submitting a nova patch: 2014-08-21 14:11:25.059 | Captured traceback: 2014-08-21 14:11:25.059 | ~~~ 2014-08-21 14:11:25.059 | Traceback (most recent call last): 2014-08-21 14:11:25.059 | File "tempest/api/compute/servers/test_server_actions.py", line 407, in test_suspend_resume_server 2014-08-21 14:11:25.059 | self.client.wait_for_server_status(self.server_id, 'SUSPENDED') 2014-08-21 14:11:25.059 | File "tempest/services/compute/xml/servers_client.py", line 390, in wait_for_server_status 2014-08-21 14:11:25.059 | raise_on_error=raise_on_error) 2014-08-21 14:11:25.059 | File "tempest/common/waiters.py", line 77, in wait_for_server_status 2014-08-21 14:11:25.059 | server_id=server_id) 2014-08-21 14:11:25.059 | BuildErrorException: Server a29ec7be-be83-4247-b7db-49bd4727d206 failed to build and is in ERROR status 2014-08-21 14:11:25.059 | Details: {'message': 'Requested operation is not valid: domain is not running', 'code': '500', 'details': 'None', 'created': '2014-08-21T13:49:49Z'} To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1359805/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1314677] Re: nova-cells fails when using JSON file to store cell information
** Also affects: nova (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1314677 Title: nova-cells fails when using JSON file to store cell information Status in OpenStack Compute (Nova): Fix Released Status in “nova” package in Ubuntu: New Status in “nova” source package in Trusty: New Bug description: As recommended in http://docs.openstack.org/havana/config- reference/content/section_compute-cells.html#cell-config-optional-json I'm creating the nova-cells config with the cell information stored in a json file. However, when I do this nova-cells fails to start with this error in the logs: 2014-04-29 11:52:05.240 16759 CRITICAL nova [-] __init__() takes exactly 3 arguments (1 given) 2014-04-29 11:52:05.240 16759 TRACE nova Traceback (most recent call last): 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/bin/nova-cells", line 10, in 2014-04-29 11:52:05.240 16759 TRACE nova sys.exit(main()) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/cmd/cells.py", line 40, in main 2014-04-29 11:52:05.240 16759 TRACE nova manager=CONF.cells.manager) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 257, in create 2014-04-29 11:52:05.240 16759 TRACE nova db_allowed=db_allowed) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 139, in __init__ 2014-04-29 11:52:05.240 16759 TRACE nova self.manager = manager_class(host=self.host, *args, **kwargs) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/cells/manager.py", line 87, in __init__ 2014-04-29 11:52:05.240 16759 TRACE nova self.state_manager = cell_state_manager() 2014-04-29 11:52:05.240 16759 TRACE nova TypeError: __init__() takes exactly 3 arguments (1 given) I have had a dig into the code and it appears that CellsManager creates an instance of CellStateManager with no arguments. CellStateManager __new__ runs and creates an instance of CellStateManagerFile which runs __new__ and __init__ with cell_state_cls and cells_config_path set. At this point __new__ returns CellStateManagerFile and the new instance's __init__() method is invoked (CellStateManagerFile.__init__) with the original arguments (there weren't any) which then results in the stack trace. It seems reasonable for CellStateManagerFile to derive the cells_config_path info for itself so I've patched it locally with === modified file 'state.py' --- state.py 2014-04-30 15:10:16 + +++ state.py 2014-04-30 15:10:26 + @@ -155,7 +155,7 @@ config_path = CONF.find_file(cells_config) if not config_path: raise cfg.ConfigFilesNotFoundError(config_files=[cells_config]) -return CellStateManagerFile(cell_state_cls, config_path) +return CellStateManagerFile(cell_state_cls) return CellStateManagerDB(cell_state_cls) @@ -450,7 +450,9 @@ class CellStateManagerFile(CellStateManager): -def __init__(self, cell_state_cls, cells_config_path): +def __init__(self, cell_state_cls=None): +cells_config = CONF.cells.cells_config +cells_config_path = CONF.find_file(cells_config) self.cells_config_path = cells_config_path super(CellStateManagerFile, self).__init__(cell_state_cls) Ubuntu: 14.04 nova-cells: 1:2014.1-0ubuntu1 nova.conf: [DEFAULT] dhcpbridge_flagfile=/etc/nova/nova.conf dhcpbridge=/usr/bin/nova-dhcpbridge logdir=/var/log/nova state_path=/var/lib/nova lock_path=/var/lock/nova force_dhcp_release=True iscsi_helper=tgtadm libvirt_use_virtio_for_bridges=True connection_type=libvirt root_helper=sudo nova-rootwrap /etc/nova/rootwrap.conf verbose=True ec2_private_dns_show_ip=True api_paste_config=/etc/nova/api-paste.ini volumes_path=/var/lib/nova/volumes enabled_apis=ec2,osapi_compute,metadata auth_strategy=keystone compute_driver=libvirt.LibvirtDriver quota_driver=nova.quota.NoopQuotaDriver [cells] enable=True name=cell cell_type=compute cells_config=/etc/nova/cells.json cells.json: { "parent": { "name": "parent", "api_url": "http://api.example.com:8774";, "transport_url": "rabbit://rabbit.example.com", "weight_offset": 0.0, "weight_scale": 1.0, "is_parent": true } } To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1314677/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.l
[Yahoo-eng-team] [Bug 1943863] Re: DPDK instances are failing to start: Failed to bind socket to /run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file or directory
https://github.com/openstack-charmers/charm-layer-ovn/pull/52 ** Also affects: neutron Importance: Undecided Status: New ** No longer affects: neutron ** No longer affects: neutron (Ubuntu) ** Also affects: charm-layer-ovn Importance: Undecided Status: New ** Changed in: charm-layer-ovn Status: New => Confirmed ** Changed in: charm-layer-ovn Importance: Undecided => High ** Changed in: charm-layer-ovn Assignee: (unassigned) => Liam Young (gnuoy) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1943863 Title: DPDK instances are failing to start: Failed to bind socket to /run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file or directory Status in charm-layer-ovn: Confirmed Status in OpenStack nova-compute charm: Invalid Bug description: == Env focal/ussuri + ovn, latest stable charms juju status: https://paste.ubuntu.com/p/2725tV47ym/ Hardware: Huawei CH121 V5 with MZ532,4*25GE Mezzanine Card,PCIE 3.0 X16 NICs + manually installed PMD for DPDK enablement (librte-pmd-hinic20.0 package) == Problem description DPDK instance can't be launched after the fresh deployment (focal/ussuri + OVN, latest stable charms), raising a below error: $ os server show dpdk-test-instance -f yaml OS-DCF:diskConfig: MANUAL OS-EXT-AZ:availability_zone: '' OS-EXT-SRV-ATTR:host: null OS-EXT-SRV-ATTR:hypervisor_hostname: null OS-EXT-SRV-ATTR:instance_name: instance-0218 OS-EXT-STS:power_state: NOSTATE OS-EXT-STS:task_state: null OS-EXT-STS:vm_state: error OS-SRV-USG:launched_at: null OS-SRV-USG:terminated_at: null accessIPv4: '' accessIPv6: '' addresses: '' config_drive: 'True' created: '2021-09-15T18:51:00Z' fault: code: 500 created: '2021-09-15T18:52:01Z' details: "Traceback (most recent call last):\n File \"/usr/lib/python3/dist-packages/nova/conductor/manager.py\"\ , line 651, in build_instances\nscheduler_utils.populate_retry(\n File \"\ /usr/lib/python3/dist-packages/nova/scheduler/utils.py\", line 919, in populate_retry\n\ \raise exception.MaxRetriesExceeded(reason=msg)\nnova.exception.MaxRetriesExceeded:\ \ Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance\ \ 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error: process\ \ exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64:\ \ -chardev socket,id=charnet0,path=/run/libvirt-vhost-user/vhu3ba44fdc-7c,server:\ \ Failed to bind socket to /run/libvirt-vhost-user/vhu3ba44fdc-7c: No such file\ \ or directory\n" message: 'Exceeded maximum number of retries. Exceeded max scheduling attempts 3 for instance 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73. Last exception: internal error: process exited while connecting to monitor: 2021-09-15T18:51:53.485265Z qemu-system-x86_64: -chardev ' flavor: m1.medium.project.dpdk (4f452aa3-2b2c-4f2e-8465-5e3c2d8ec3f1) hostId: '' id: 1bb2d1b7-e2e9-4d76-a346-a9b06ff22c73 image: auto-sync/ubuntu-bionic-18.04-amd64-server-20210907-disk1.img (3851450e-e73d-489b-a356-33650690ed7a) key_name: ubuntu-keypair name: dpdk-test-instance project_id: cdade870811447a89e2f0199373a0d95 properties: '' status: ERROR updated: '2021-09-15T18:52:01Z' user_id: 13a0e7862c6641eeaaebbde1ae096f9e volumes_attached: '' For the record, a "generic" instances (e.g non-DPDK/non-SRIOV) are scheduling/starting without any issues. == Steps to reproduce openstack network create --external --provider-network-type vlan --provider-segment xxx --provider-physical-network dpdkfabric ext_net_dpdk openstack subnet create --allocation-pool start=,end= --network ext_net_dpdk --subnet-range /23 --gateway --no-dhcp ext_net_dpdk_subnet openstack aggregate create --zone nova dpdk openstack aggregate set --property dpdk=true dpdk openstack aggregate add host dpdk openstack aggregate show dpdk --max-width=80 openstack flavor set --property aggregate_instance_extra_specs:dpdk=true --property hw:mem_page_size=large m1.medium.dpdk openstack server create --config-drive true --network ext_net_dpdk --key-name ubuntu-keypair --image focal --flavor m1.medium.dpdk dpdk- test-instance == Analysis [before redeployment] nova-compute log : https://pastebin.canonical.com/p/FgPYNb3bPj/ [fresh deployment] juju crashdump: https://drive.google.com/file/d/1W_w3CAUq4ggp4alDnpCk08mSaCL6Uaxk/view?usp=sharing # ovs-vsctl get open_vswitch . other_config {dpdk-extra="--pci-whitelist :3e:00.0 --pci-whitelist :40:00.0", dpdk-init="true", dpdk-lco
[Yahoo-eng-team] [Bug 1964117] [NEW] Unable to contact to IPv6 instance using ml2 ovs with ovs 2.16
Public bug reported: Connectivity is fine with OVS 2.15 but after upgrading ovs, connectivity is lost to remote units over ipv6. The traffic appears to be lost while being processed by the openflow firewall associated with br-int. The description below uses connectivity between Octavia units and amphora to illustrate the issue but I don't think this issue is related to Octavia. OS: Ubuntu Focal OVS: 2.16.0-0ubuntu2.1~cloud0 Kernel: 5.4.0-100-generic With a fresh install of xena or after an upgrade of OVS from 2.15 (wallaby) to 2.16 (xena) connectivity from the octavia units to the amphora is broken. * Wallaby works as expected * Disabling port security on the octavia units octavia-health-manager-octavia-N-listen-port restores connectivity. * The flows on br-int and br-tun are the same after the upgrade from 2.15 to 2.16 * Manually inserting permissive flows into the br-int flow table also restores connectivity. * Testing environment is Openstack on top of Openstack. Text below is reproduced here https://pastebin.ubuntu.com/p/hRWMx7d9HG/ as it maybe easier to read in a pastebin. Below is reproduction of the issue first deploying wallaby to validate connectivity before upgrading openvswitch. Amphora: $ openstack loadbalancer amphora list +--+--+---++-+-+ | id | loadbalancer_id | status| role | lb_network_ip | ha_ip | +--+--+---++-+-+ | 30afe97a-bcd4-4537-a621-830de87568b0 | ae840c86-768d-4aae-b804-8fddf2880c78 | ALLOCATED | MASTER | fc00:92e3:d18a:36ed:f816:3eff:fed2:32e0 | 10.42.0.254 | | 61e66eff-e83b-4a21-bc1f-1e1a0037b191 | ae840c86-768d-4aae-b804-8fddf2880c78 | ALLOCATED | BACKUP | fc00:92e3:d18a:36ed:f816:3eff:fe69:c85b | 10.42.0.254 | +--+--+---++-+-+ $ openstack router show lb-mgmt -c name -c interfaces_info +-+---+ | Field | Value | +-+---+ | interfaces_info | [{"port_id": "191a2d27-9b15-4938-a818-b48fc405a27a", "ip_address": "fc00:92e3:d18a:36ed::", "subnet_id": "8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03"}] | | name| lb-mgmt | +-+---+ Looking at ports on that subnet there is a port for each of the octavia units (named octavia-health-manager-octavia-N-listen-port ), a port on each of the amphora listed above and a port for the lb-mgmt router. $ openstack port list | grep 8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03 | 0943521f-2c1f-4152-8250-48d310e3918f | octavia-health-manager-octavia-1-listen-port | fa:16:3e:70:70:c9 | ip_address='fc00:92e3:d18a:36ed:f816:3eff:fe70:70c9', subnet_id='8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03' | ACTIVE | | 160b8854-0f20-471b-9ac4-53f8891f4edb | | fa:16:3e:45:7a:a6 | ip_address='fc00:92e3:d18a:36ed:f816:3eff:fe45:7aa6', subnet_id='8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03' | ACTIVE | | 191a2d27-9b15-4938-a818-b48fc405a27a | | fa:16:3e:3e:bd:45 | ip_address='fc00:92e3:d18a:36ed::', subnet_id='8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03' | ACTIVE | | 2428b1d4-0cb2-420b-81a5-5e6ae34e4557 | octavia-health-manager-octavia-2-listen-port | fa:16:3e:05:f3:2a | ip_address='fc00:92e3:d18a:36ed:f816:3eff:fe05:f32a', subnet_id='8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03' | ACTIVE | | 2ea37e19-bd60-43cb-8191-aaf179667b1a | | fa:16:3e:d2:32:e0 | ip_address='fc00:92e3:d18a:36ed:f816:3eff:fed2:32e0', subnet_id='8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03' | ACTIVE | | 76742ab6-39ee-4b06-a37d-f2ecad2c892a | octavia-health-manager-octavia-0-listen-port | fa:16:3e:79:b6:46 | ip_address='fc00:92e3:d18a:36ed:f816:3eff:fe79:b646', subnet_id='8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03' | ACTIVE | | ffb3d106-7a14-4b4e-8300-2dd9ec9bc
[Yahoo-eng-team] [Bug 1964117] Re: Unable to contact to IPv6 instance using ml2 ovs with ovs 2.16
The issue seems to be in ovs, specifically this commit https://github.com/openvswitch/ovs/commit/355fef6f2ccbcf78797b938421cb4cef9b59af13 . I have created a ppa https://launchpad.net/~gnuoy/+archive/ubuntu/focal-xena/+packages that has a copy of the openvswitch package from the xena-proposed UCA. The only change I have made is backing out that commit (and temporarily disabling auto pkg tests). The following pastebin shows: 1) checking connectivity with ovs 2.15 2) upgrading to 2.16 and seeing that connectivity is broken 3) upgrading to 2.16 with 355fef6f2 reverted and seeing connectivity is restored https://paste.ubuntu.com/p/nSHjRZzbmp/ ** Also affects: openvswitch Importance: Undecided Status: New ** Changed in: neutron Status: New => Invalid -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1964117 Title: Unable to contact to IPv6 instance using ml2 ovs with ovs 2.16 Status in neutron: Invalid Status in openvswitch: New Bug description: Connectivity is fine with OVS 2.15 but after upgrading ovs, connectivity is lost to remote units over ipv6. The traffic appears to be lost while being processed by the openflow firewall associated with br-int. The description below uses connectivity between Octavia units and amphora to illustrate the issue but I don't think this issue is related to Octavia. OS: Ubuntu Focal OVS: 2.16.0-0ubuntu2.1~cloud0 Kernel: 5.4.0-100-generic With a fresh install of xena or after an upgrade of OVS from 2.15 (wallaby) to 2.16 (xena) connectivity from the octavia units to the amphora is broken. * Wallaby works as expected * Disabling port security on the octavia units octavia-health-manager-octavia-N-listen-port restores connectivity. * The flows on br-int and br-tun are the same after the upgrade from 2.15 to 2.16 * Manually inserting permissive flows into the br-int flow table also restores connectivity. * Testing environment is Openstack on top of Openstack. Text below is reproduced here https://pastebin.ubuntu.com/p/hRWMx7d9HG/ as it maybe easier to read in a pastebin. Below is reproduction of the issue first deploying wallaby to validate connectivity before upgrading openvswitch. Amphora: $ openstack loadbalancer amphora list +--+--+---++-+-+ | id | loadbalancer_id | status| role | lb_network_ip | ha_ip | +--+--+---++-+-+ | 30afe97a-bcd4-4537-a621-830de87568b0 | ae840c86-768d-4aae-b804-8fddf2880c78 | ALLOCATED | MASTER | fc00:92e3:d18a:36ed:f816:3eff:fed2:32e0 | 10.42.0.254 | | 61e66eff-e83b-4a21-bc1f-1e1a0037b191 | ae840c86-768d-4aae-b804-8fddf2880c78 | ALLOCATED | BACKUP | fc00:92e3:d18a:36ed:f816:3eff:fe69:c85b | 10.42.0.254 | +--+--+---++-+-+ $ openstack router show lb-mgmt -c name -c interfaces_info +-+---+ | Field | Value | +-+---+ | interfaces_info | [{"port_id": "191a2d27-9b15-4938-a818-b48fc405a27a", "ip_address": "fc00:92e3:d18a:36ed::", "subnet_id": "8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03"}] | | name| lb-mgmt | +-+---+ Looking at ports on that subnet there is a port for each of the octavia units (named octavia-health-manager-octavia-N-listen-port ), a port on each of the amphora listed above and a port for the lb-mgmt router. $ openstack port list | grep 8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03 | 0943521f-2c1f-4152-8250-48d310e3918f | octavia-health-manager-octavia-1-listen-port | fa:16:3e:70:70:c9 | ip_address='fc00:92e3:d18a:36ed:f816:3eff:fe70:70c9', subnet_id='8b4307a7-08a1-4f2b-a7e0-ce45a7ad0b03' | ACTIVE | | 160b8854-0f20-471b-9ac4-53f8891f4edb |
[Yahoo-eng-team] [Bug 1826382] Re: Updates to placement api fail if placement endpoint changes
pute.manager self._update_to_placement(context, compute_node) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 912, in _update_to_placement 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager context, compute_node.uuid, name=compute_node.hypervisor_hostname) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/scheduler/client/__init__.py", line 35, in __run_method 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager return getattr(self.instance, __name)(*args, **kwargs) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 1006, in get_provider_tree_and_ensure_root 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager parent_provider_uuid=parent_provider_uuid) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 668, in _ensure_resource_provider 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager rps_to_refresh = self._get_providers_in_tree(context, uuid) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 74, in wrapper 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager return f(self, *a, **k) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/scheduler/client/report.py", line 535, in _get_providers_in_tree 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager raise exception.ResourceProviderRetrievalFailed(uuid=uuid) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager nova.exception.ResourceProviderRetrievalFailed: Failed to get resource provider with UUID 4f7c6844-d3b8-4710-be2c-8691a93fb58b 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager ** Also affects: nova Importance: Undecided Status: New ** Changed in: nova Assignee: (unassigned) => Liam Young (gnuoy) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1826382 Title: Updates to placement api fail if placement endpoint changes Status in OpenStack nova-compute charm: New Status in OpenStack Compute (nova): New Bug description: If the url of the placement api changes after nova-compute has been started then placement updates fail as nova-compute appears to cache the old endpoint url. To reproduce, update the placement endpoint to something incorrect in keystone and restart nova-compute. Errors contacting the placement api will be reported every minute or so. Now, correct the entry in keystone. The errors will continue despite the catalogue now being correct. Restarting nova-compute fixes the issue. In my deployment this occurred when the placement end point switched from http to https after the nova-compute node had started. This resulted in the following in the nova-compute log: 2019-04-25 09:58:12.175 31793 ERROR nova.scheduler.client.report [req-18b4f522-e702-4ee1-ba85-e565c8e9ac1e - - - - -] [None] Failed to retrieve resource provider tree from placement API for UUID 4f7c6844-d3b8-4710-be2c-8691a93fb58b. Got 400: 400 Bad Request Bad Request Your browser sent a request that this server could not understand. Reason: You're speaking plain HTTP to an SSL-enabled server port. Instead use the HTTPS scheme to access this URL, please. Apache/2.4.29 (Ubuntu) Server at 10.5.0.36 Port 443 . 2019-04-25 09:58:12.176 31793 DEBUG oslo_concurrency.lockutils [req-18b4f522-e702-4ee1-ba85-e565c8e9ac1e - - - - -] Lock "compute_resources" released by "nova.compute.resource_tracker.ResourceTracker._update_available_resource" :: held 0.099s inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:285 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager [req-18b4f522-e702-4ee1-ba85-e565c8e9ac1e - - - - -] Error updating resources for node juju-7a9f5c-zaza-19a393f3689b-16.project.serverstack.: nova.exception.ResourceProviderRetrievalFailed: Failed to get resource provider with UUID 4f7c6844-d3b8-4710-be2c-8691a93fb58b 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager Traceback (most recent call last): 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 7778, in _update_available_resource_for_node 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/
[Yahoo-eng-team] [Bug 1826382] Re: Updates to placement api fail if placement endpoint changes
Work around in the charm committed here: https://review.opendev.org/#/c/755089 https://review.opendev.org/#/c/755076/ ** Also affects: charm-keystone Importance: Undecided Status: New ** Also affects: charm-nova-cloud-controller Importance: Undecided Status: New ** Changed in: nova Assignee: Liam Young (gnuoy) => (unassigned) ** Changed in: charm-keystone Assignee: (unassigned) => Liam Young (gnuoy) ** Changed in: charm-nova-cloud-controller Assignee: (unassigned) => Liam Young (gnuoy) ** Changed in: charm-nova-compute Status: Triaged => Invalid ** Changed in: charm-keystone Status: New => Fix Committed ** Changed in: charm-nova-cloud-controller Status: New => Fix Committed ** Changed in: charm-keystone Importance: Undecided => High ** Changed in: charm-nova-cloud-controller Importance: Undecided => High -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1826382 Title: Updates to placement api fail if placement endpoint changes Status in OpenStack keystone charm: Fix Committed Status in OpenStack nova-cloud-controller charm: Fix Committed Status in OpenStack nova-compute charm: Invalid Status in OpenStack Compute (nova): Triaged Bug description: If the url of the placement api changes after nova-compute has been started then placement updates fail as nova-compute appears to cache the old endpoint url. To reproduce, update the placement endpoint to something incorrect in keystone and restart nova-compute. Errors contacting the placement api will be reported every minute or so. Now, correct the entry in keystone. The errors will continue despite the catalogue now being correct. Restarting nova-compute fixes the issue. In my deployment this occurred when the placement end point switched from http to https after the nova-compute node had started. This resulted in the following in the nova-compute log: 2019-04-25 09:58:12.175 31793 ERROR nova.scheduler.client.report [req-18b4f522-e702-4ee1-ba85-e565c8e9ac1e - - - - -] [None] Failed to retrieve resource provider tree from placement API for UUID 4f7c6844-d3b8-4710-be2c-8691a93fb58b. Got 400: 400 Bad Request Bad Request Your browser sent a request that this server could not understand. Reason: You're speaking plain HTTP to an SSL-enabled server port. Instead use the HTTPS scheme to access this URL, please. Apache/2.4.29 (Ubuntu) Server at 10.5.0.36 Port 443 . 2019-04-25 09:58:12.176 31793 DEBUG oslo_concurrency.lockutils [req-18b4f522-e702-4ee1-ba85-e565c8e9ac1e - - - - -] Lock "compute_resources" released by "nova.compute.resource_tracker.ResourceTracker._update_available_resource" :: held 0.099s inner /usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py:285 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager [req-18b4f522-e702-4ee1-ba85-e565c8e9ac1e - - - - -] Error updating resources for node juju-7a9f5c-zaza-19a393f3689b-16.project.serverstack.: nova.exception.ResourceProviderRetrievalFailed: Failed to get resource provider with UUID 4f7c6844-d3b8-4710-be2c-8691a93fb58b 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager Traceback (most recent call last): 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/manager.py", line 7778, in _update_available_resource_for_node 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager rt.update_available_resource(context, nodename) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 721, in update_available_resource 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager self._update_available_resource(context, resources) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager return f(*args, **kwargs) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/nova/compute/resource_tracker.py", line 798, in _update_available_resource 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager self._update(context, cn) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/retrying.py", line 49, in wrapped_f 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager return Retrying(*dargs, **dkw).call(f, *args, **kw) 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager File "/usr/lib/python3/dist-packages/retrying.py", line 206, in call 2019-04-25 09:58:12.177 31793 ERROR nova.compute.manager
[Yahoo-eng-team] [Bug 1896603] Re: ovn-octavia-provider: Cannot create listener due to alowed_cidrs validation
** Also affects: ovn-octavia-provider (Ubuntu) Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/1896603 Title: ovn-octavia-provider: Cannot create listener due to alowed_cidrs validation Status in neutron: Fix Released Status in ovn-octavia-provider package in Ubuntu: New Bug description: Kuryr-Kubernetes tests running with ovn-octavia-provider started to fail with "Provider 'ovn' does not support a requested option: OVN provider does not support allowed_cidrs option" showing up in the o-api logs. We've tracked that to check [1] getting introduced. Apparently it's broken and makes the request explode even if the property isn't set at all. Please take a look at output from python-openstackclient [2] where body I used is just '{"listener": {"loadbalancer_id": "faca9a1b- 30dc-45cb-80ce-2ab1c26b5521", "protocol": "TCP", "protocol_port": 80, "admin_state_up": true}}'. Also this is all over your gates as well, see o-api log [3]. Somehow ovn-octavia-provider tests skip 171 results there, so that's why it's green. [1] https://opendev.org/openstack/ovn-octavia-provider/src/branch/master/ovn_octavia_provider/driver.py#L142 [2] http://paste.openstack.org/show/798197/ [3] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_4ba/751085/7/gate/ovn-octavia-provider-v2-dsvm-scenario/4bac575/controller/logs/screen-o-api.txt To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/1896603/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785235] [NEW] metadata retrieval fails when using a global nova-api-metadata service
Public bug reported: Description === The nova-api-metadata service fails to provide metadata to guests when it is providing metadata for multiple cells. Steps to reproduce == Deploy a a environment with multiple cells and a single nova-api-metadata service. Requests by the guests for metadata will fail. Expected result === Guests would get metadata. Actual result = Guests do not get metadata, they get a 404. Environment === 1. Exact version of OpenStack you are running. $ dpkg -l | grep nova-comm ii nova-common 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute - common files 2. Which hypervisor did you use? Libvirt + KVM 2. Which storage type did you use? n/a 3. Which networking type did you use? Neutron with OpenVSwitch ** Affects: nova Importance: Undecided Assignee: Liam Young (gnuoy) Status: New ** Changed in: nova Assignee: (unassigned) => Liam Young (gnuoy) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1785235 Title: metadata retrieval fails when using a global nova-api-metadata service Status in OpenStack Compute (nova): New Bug description: Description === The nova-api-metadata service fails to provide metadata to guests when it is providing metadata for multiple cells. Steps to reproduce == Deploy a a environment with multiple cells and a single nova-api-metadata service. Requests by the guests for metadata will fail. Expected result === Guests would get metadata. Actual result = Guests do not get metadata, they get a 404. Environment === 1. Exact version of OpenStack you are running. $ dpkg -l | grep nova-comm ii nova-common 2:17.0.5-0ubuntu1~cloud0 all OpenStack Compute - common files 2. Which hypervisor did you use? Libvirt + KVM 2. Which storage type did you use? n/a 3. Which networking type did you use? Neutron with OpenVSwitch To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1785235/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1785237] [NEW] The section on the Neutron Metadata API proxy in cellsv2-layout.html is confusing and possibly wrong
Public bug reported: I found it hard to understand what configuration was needed when reading: "The Neutron metadata API proxy should be global across all cells, and thus be configured as an API-level service with access to the [api_database]/connection information." Which service is it referring to ns-metadata-proxy, neutron-metadata- agent or nova-api-metadata? Given that the 'api_database' section is only valid for nova that would suggest its the nova-api-metadata but the nova-api-metadata receives all its data via rpc (as far as I can tell) so it doesn't seem to need api_database section. ** Affects: nova Importance: Undecided Assignee: Liam Young (gnuoy) Status: In Progress ** Changed in: nova Assignee: (unassigned) => Liam Young (gnuoy) -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1785237 Title: The section on the Neutron Metadata API proxy in cellsv2-layout.html is confusing and possibly wrong Status in OpenStack Compute (nova): In Progress Bug description: I found it hard to understand what configuration was needed when reading: "The Neutron metadata API proxy should be global across all cells, and thus be configured as an API-level service with access to the [api_database]/connection information." Which service is it referring to ns-metadata-proxy, neutron-metadata- agent or nova-api-metadata? Given that the 'api_database' section is only valid for nova that would suggest its the nova-api-metadata but the nova-api-metadata receives all its data via rpc (as far as I can tell) so it doesn't seem to need api_database section. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1785237/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1815844] Re: iscsi multipath dm-N device only used on first volume attachment
I don't think this is related to the charm, it looks like a bug in upstream nova. ** Also affects: nova (Ubuntu) Importance: Undecided Status: New ** No longer affects: nova (Ubuntu) ** Also affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1815844 Title: iscsi multipath dm-N device only used on first volume attachment Status in OpenStack nova-compute charm: New Status in OpenStack Compute (nova): New Bug description: With nova-compute from cloud:xenial-queens and use-multipath=true iscsi multipath is configured and the dm-N devices used on the first attachment but subsequent attachments only use a single path. The back-end storage is a Purestorage array. The multipath.conf is attached The issue is easily reproduced as shown below: jog@pnjostkinfr01:~⟫ openstack volume create pure2 --size 10 --type pure +-+--+ | Field | Value| +-+--+ | attachments | [] | | availability_zone | nova | | bootable| false| | consistencygroup_id | None | | created_at | 2019-02-13T23:07:40.00 | | description | None | | encrypted | False| | id | e286161b-e8e8-47b0-abe3-4df411993265 | | migration_status| None | | multiattach | False| | name| pure2| | properties | | | replication_status | None | | size| 10 | | snapshot_id | None | | source_volid| None | | status | creating | | type| pure | | updated_at | None | | user_id | c1fa4ae9a0b446f2ba64eebf92705d53 | +-+--+ jog@pnjostkinfr01:~⟫ openstack volume show pure2 ++--+ | Field | Value| ++--+ | attachments| [] | | availability_zone | nova | | bootable | false| | consistencygroup_id| None | | created_at | 2019-02-13T23:07:40.00 | | description| None | | encrypted | False| | id | e286161b-e8e8-47b0-abe3-4df411993265 | | migration_status | None | | multiattach| False| | name | pure2| | os-vol-host-attr:host | cinder@cinder-pure#cinder-pure | | os-vol-mig-status-attr:migstat | None | | os-vol-mig-status-attr:name_id | None | | os-vol-tenant-attr:tenant_id | 9be499fd1eee48dfb4dc6faf3cc0a1d7 | | properties | | | replication_status | None | | size | 10 | | snapshot_id| None | | source_volid | None | | status | available| | type | pure | | updated_at | 2019-02-13T23:07:41.00 | | user_id| c1fa4ae9a0b446f2ba64eebf92705d53 | ++--+ Add the volume to an instance: jog@pnjostkinfr01:~⟫ openstack server add volume T1 pure2 jog@pnjostkinfr01:~⟫ openstack server show T1
[Yahoo-eng-team] [Bug 1742421] [NEW] Cells Layout (v2) in nova doc misleading about upcalls
Public bug reported: - [X] This doc is inaccurate in this way: Documentation suggests nova v2 cells do not make 'upcalls' but they do when talking to the placement api. - [ ] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. It is important to note that services in the lower cell boxes only have the ability to call back to the placement API and no other API-layer services via RPC, nor do they have access to the API database for global visibility of resources across the cloud. This is intentional and provides security and failure domain isolation benefits, but also has impacts on somethings that would otherwise require this any-to-any communication style. Check the release notes for the version of Nova you are using for the most up-to-date information about any caveats that may be present due to this limitation. --- Release: 17.0.0.0b3.dev323 on 2018-01-09 21:52 SHA: 90a92d33edaea2b7411a5fd528f3159a486e1fd0 Source: https://git.openstack.org/cgit/openstack/nova/tree/doc/source/user/cellsv2-layout.rst URL: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1742421 Title: Cells Layout (v2) in nova doc misleading about upcalls Status in OpenStack Compute (nova): New Bug description: - [X] This doc is inaccurate in this way: Documentation suggests nova v2 cells do not make 'upcalls' but they do when talking to the placement api. - [ ] This is a doc addition request. - [ ] I have a fix to the document that I can paste below including example: input and output. It is important to note that services in the lower cell boxes only have the ability to call back to the placement API and no other API-layer services via RPC, nor do they have access to the API database for global visibility of resources across the cloud. This is intentional and provides security and failure domain isolation benefits, but also has impacts on somethings that would otherwise require this any-to-any communication style. Check the release notes for the version of Nova you are using for the most up-to-date information about any caveats that may be present due to this limitation. --- Release: 17.0.0.0b3.dev323 on 2018-01-09 21:52 SHA: 90a92d33edaea2b7411a5fd528f3159a486e1fd0 Source: https://git.openstack.org/cgit/openstack/nova/tree/doc/source/user/cellsv2-layout.rst URL: https://docs.openstack.org/nova/latest/user/cellsv2-layout.html To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1742421/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1742649] [NEW] map_instances default batch size is too small.
Public bug reported: Description === map_instances seemingly hung for hours on a cloud with ~19 instance records. I think the following fixes are valid (in order of preference): 1) nova_manage should examine the amount of instances that need mapping and make an informed choice about batch size if max_count is not set. 2) max_counts default should be raised. It is currently 50 and I cannot imagine what use case 50 is a good default for. For small clouds the max_count is almost irrelevant, for medium/large clouds 50 is far too low. 3) Update max_count description. It currently reads "Maximum number of instances to map" but I think it should also point out that this is the batch size that instances will be processed in. Steps to reproduce == Fire up a large number of instances on a cloud and run map_instances without max_count set: nova-manage --config-file /etc/nova/nova.conf cell_v2 map_instances --cell_uuid Expected result === The command should complete in a reasonable time (under an hour) Actual result = Command runs for over three hours Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ If this is from a distro please provide # dpkg -l | grep nova ii nova-api-os-compute 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - OpenStack Compute API frontend ii nova-common 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - common files ii nova-conductor 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - conductor service ii nova-placement-api 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend ii nova-scheduler 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler ii python-nova 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute Python libraries ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1742649 Title: map_instances default batch size is too small. Status in OpenStack Compute (nova): New Bug description: Description === map_instances seemingly hung for hours on a cloud with ~19 instance records. I think the following fixes are valid (in order of preference): 1) nova_manage should examine the amount of instances that need mapping and make an informed choice about batch size if max_count is not set. 2) max_counts default should be raised. It is currently 50 and I cannot imagine what use case 50 is a good default for. For small clouds the max_count is almost irrelevant, for medium/large clouds 50 is far too low. 3) Update max_count description. It currently reads "Maximum number of instances to map" but I think it should also point out that this is the batch size that instances will be processed in. Steps to reproduce == Fire up a large number of instances on a cloud and run map_instances without max_count set: nova-manage --config-file /etc/nova/nova.conf cell_v2 map_instances --cell_uuid Expected result === The command should complete in a reasonable time (under an hour) Actual result = Command runs for over three hours Environment === 1. Exact version of OpenStack you are running. See the following list for all releases: http://docs.openstack.org/releases/ If this is from a distro please provide # dpkg -l | grep nova ii nova-api-os-compute 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - OpenStack Compute API frontend ii nova-common 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - common files ii nova-conductor 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - conductor service ii nova-placement-api 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - placement API frontend ii nova-scheduler 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute - virtual machine scheduler ii python-nova 2:16.0.3-0ubuntu1~cloud0 all OpenStack Compute Python libraries To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1742649/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://
[Yahoo-eng-team] [Bug 1472712] Re: Using SSL with rabbitmq prevents communication between nova-compute and conductor after latest nova updates
** Also affects: python-oslo.messaging (Ubuntu) Importance: Undecided Status: New ** Changed in: oslo.messaging Status: Confirmed => Invalid ** Changed in: nova Status: New => Invalid ** Changed in: python-oslo.messaging (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1472712 Title: Using SSL with rabbitmq prevents communication between nova-compute and conductor after latest nova updates Status in OpenStack Compute (nova): Invalid Status in oslo.messaging: Invalid Status in python-oslo.messaging package in Ubuntu: Confirmed Bug description: On the latest update of the Ubuntu OpenStack packages, it was discovered that the nova-compute/nova-conductor (1:2014.1.4-0ubuntu2.1) packages encountered a bug with using SSL to connect to rabbitmq. When this problem occurs, the compute node cannot connect to the controller, and this message is constantly displayed: WARNING nova.conductor.api [req-4022395c-9501-47cf-bf8e-476e1cc58772 None None] Timed out waiting for nova-conductor. Is it running? Or did this service start before nova-conductor? Investigation revealed that having rabbitmq configured with SSL was the root cause of this problem. This seems to have been introduced with the current version of the nova packages. Rabbitmq was not updated as part of this distribution update, but the messaging library (python-oslo.messaging 1.3.0-0ubuntu1.1) was updated. So the problem could exist in any of these components. Versions installed: Openstack version: Icehouse Ubuntu 14.04.2 LTS nova-conductor1:2014.1.4-0ubuntu2.1 nova-compute1:2014.1.4-0ubuntu2.1 rabbitmq-server 3.2.4-1 openssl:amd64/trusty-security 1.0.1f-1ubuntu2.15 To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1472712/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1327218] Re: Volume detach failure because of invalid bdm.connection_info
The fix went into 2015.1.0 and 2015.1.1 is now in the cloud archive. ** Changed in: nova (Ubuntu) Status: New => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1327218 Title: Volume detach failure because of invalid bdm.connection_info Status in OpenStack Compute (nova): Fix Released Status in nova package in Ubuntu: Fix Released Status in nova source package in Trusty: New Bug description: Example of this here: http://logs.openstack.org/33/97233/1/check/check-grenade- dsvm/f7b8a11/logs/old/screen-n-cpu.txt.gz?level=TRACE#_2014-06-02_14_13_51_125 File "/opt/stack/old/nova/nova/compute/manager.py", line 4153, in _detach_volume connection_info = jsonutils.loads(bdm.connection_info) File "/opt/stack/old/nova/nova/openstack/common/jsonutils.py", line 164, in loads return json.loads(s) File "/usr/lib/python2.7/json/__init__.py", line 326, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) TypeError: expected string or buffer and this was in grenade with stable/icehouse nova commit 7431cb9 There's nothing unusual about the test which triggers this - simply attaches a volume to an instance, waits for it to show up in the instance and then tries to detach it logstash query for this: message:"Exception during message handling" AND message:"expected string or buffer" AND message:"connection_info = jsonutils.loads(bdm.connection_info)" AND tags:"screen-n-cpu.txt" but it seems to be very rare To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/1327218/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1314677] [NEW] nova-cells fails when using JSON file to store cell information
Public bug reported: As recommended in http://docs.openstack.org/havana/config- reference/content/section_compute-cells.html#cell-config-optional-json I'm creating the nova-cells config with the cell information stored in a json file. However, when I do this nova-cells fails to start with this error in the logs: 2014-04-29 11:52:05.240 16759 CRITICAL nova [-] __init__() takes exactly 3 arguments (1 given) 2014-04-29 11:52:05.240 16759 TRACE nova Traceback (most recent call last): 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/bin/nova-cells", line 10, in 2014-04-29 11:52:05.240 16759 TRACE nova sys.exit(main()) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/cmd/cells.py", line 40, in main 2014-04-29 11:52:05.240 16759 TRACE nova manager=CONF.cells.manager) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 257, in create 2014-04-29 11:52:05.240 16759 TRACE nova db_allowed=db_allowed) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/service.py", line 139, in __init__ 2014-04-29 11:52:05.240 16759 TRACE nova self.manager = manager_class(host=self.host, *args, **kwargs) 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/lib/python2.7/dist-packages/nova/cells/manager.py", line 87, in __init__ 2014-04-29 11:52:05.240 16759 TRACE nova self.state_manager = cell_state_manager() 2014-04-29 11:52:05.240 16759 TRACE nova TypeError: __init__() takes exactly 3 arguments (1 given) I have had a dig into the code and it appears that CellsManager creates an instance of CellStateManager with no arguments. CellStateManager __new__ runs and creates an instance of CellStateManagerFile which runs __new__ and __init__ with cell_state_cls and cells_config_path set. At this point __new__ returns CellStateManagerFile and the new instance's __init__() method is invoked (CellStateManagerFile.__init__) with the original arguments (there weren't any) which then results in the stack trace. It seems reasonable for CellStateManagerFile to derive the cells_config_path info for itself so I've patched it locally with === modified file 'state.py' --- state.py2014-04-30 15:10:16 + +++ state.py2014-04-30 15:10:26 + @@ -155,7 +155,7 @@ config_path = CONF.find_file(cells_config) if not config_path: raise cfg.ConfigFilesNotFoundError(config_files=[cells_config]) -return CellStateManagerFile(cell_state_cls, config_path) +return CellStateManagerFile(cell_state_cls) return CellStateManagerDB(cell_state_cls) @@ -450,7 +450,9 @@ class CellStateManagerFile(CellStateManager): -def __init__(self, cell_state_cls, cells_config_path): +def __init__(self, cell_state_cls=None): +cells_config = CONF.cells.cells_config +cells_config_path = CONF.find_file(cells_config) self.cells_config_path = cells_config_path super(CellStateManagerFile, self).__init__(cell_state_cls) Ubuntu: 14.04 nova-cells: 1:2014.1-0ubuntu1 nova.conf: [DEFAULT] dhcpbridge_flagfile=/etc/nova/nova.conf dhcpbridge=/usr/bin/nova-dhcpbridge logdir=/var/log/nova state_path=/var/lib/nova lock_path=/var/lock/nova force_dhcp_release=True iscsi_helper=tgtadm libvirt_use_virtio_for_bridges=True connection_type=libvirt root_helper=sudo nova-rootwrap /etc/nova/rootwrap.conf verbose=True ec2_private_dns_show_ip=True api_paste_config=/etc/nova/api-paste.ini volumes_path=/var/lib/nova/volumes enabled_apis=ec2,osapi_compute,metadata auth_strategy=keystone compute_driver=libvirt.LibvirtDriver quota_driver=nova.quota.NoopQuotaDriver [cells] enable=True name=cell cell_type=compute cells_config=/etc/nova/cells.json cells.json: { "parent": { "name": "parent", "api_url": "http://api.example.com:8774";, "transport_url": "rabbit://rabbit.example.com", "weight_offset": 0.0, "weight_scale": 1.0, "is_parent": true } } ** Affects: nova Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/1314677 Title: nova-cells fails when using JSON file to store cell information Status in OpenStack Compute (Nova): New Bug description: As recommended in http://docs.openstack.org/havana/config- reference/content/section_compute-cells.html#cell-config-optional-json I'm creating the nova-cells config with the cell information stored in a json file. However, when I do this nova-cells fails to start with this error in the logs: 2014-04-29 11:52:05.240 16759 CRITICAL nova [-] __init__() takes exactly 3 arguments (1 given) 2014-04-29 11:52:05.240 16759 TRACE nova Traceback (most recent call last): 2014-04-29 11:52:05.240 16759 TRACE nova File "/usr/bin/nova-cell