Public bug reported:

Description
===========

We started to experience instance build failures relating to port
binding failure on our 2024.1 system (with VF-LAG), relating to

```
Refusing to bind due to unsupported vnic_type: direct with no switchdev 
capability bind_port
```

and this information was missing from nova's pci_devices table:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at          | updated_at          | deleted_at | deleted | id    | 
compute_node_id | address      | product_id | vendor_id | dev_type | dev_id     
      | label           | status    | extra_info                                
                                                                                
                                                                                
                                                                                
                         | instance_uuid | request_id | numa_node | parent_addr 
 | uuid                                 |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:24:38 | 2025-03-24 10:05:24 | NULL       |       0 | 19153 |    
        2782 | 0000:a1:01.5 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_01_5 | label_15b3_101a | available   | {"parent_ifname": 
"ens2f0_14", "capabilities": "{\"sriov\": {\"pf_mac_address\": 
\"3e:0b:a6:3d:08:51\", \"vf_num\": 11}, \"vpd\": {\"card_serial_number\": 
\"IL09FTMY74031167007R\"}}"}
```

Nova should be correctly assigning VF capabilities following this patch: 
https://review.opendev.org/c/openstack/nova/+/884439, and in our case the DB 
entry for a VF should like like:
```
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| created_at          | updated_at          | deleted_at | deleted | id    | 
compute_node_id | address      | product_id | vendor_id | dev_type | dev_id     
      | label           | status    | extra_info                                
                                                                                
                                                                                
                                                                                
                         | instance_uuid | request_id | numa_node | parent_addr 
 | uuid                                 |
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"}           | NULL 
         | NULL                                 |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

Steps to Reproduce
==================
* create an instance with VF-LAG SRIOV 'direct' NIC
* restart nova-compute on that hypervisor
* delete instance

and then the VF in the pci_devices table is left with incomplete
capabilities

Expected Result
===============
The VF entry in pci_devices should contain the full set of capabilities

DB output
=========
This is the expected content of the DB before attach, during, and after
```
Before
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

allocated:
| 2024-08-08 13:25:19 | 2025-03-26 10:40:52 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
0af885b2-921a-4af8-9cec-f227c82e4b86 | 12f68837-b9f9-4993-a242-74c901483440 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:49:06 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```

but this is the content of the DB if nova-compute is restarted during the 
lifetime of the instance:
```
Before:
| 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"}           | NULL 
         | NULL                                 |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

Allocated
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

Nova compute restarted
| 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

instance torn down:
| 2024-08-08 13:25:19 | 2025-03-26 10:35:27 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", 
\"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} 
| NULL          | NULL       |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

nova compute restarted again:
| 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL       |       0 | 19768 |    
        2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
```

Environment
===========
Openstack 2024.1
Kolla-Ansible
Rocky 9 + KVM
Neutron OVS with Mellanox VF-LAG on ConnectX-5

** Affects: nova
     Importance: Undecided
         Status: New

** Description changed:

  Description
  ===========
  
  We started to experience instance build failures relating to port
  binding failure on our 2024.1 system (with VF-LAG), relating to
  
  ```
  Refusing to bind due to unsupported vnic_type: direct with no switchdev 
capability bind_port
  ```
  
  and this information was missing from nova's pci_devices table:
  ```
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | created_at          | updated_at          | deleted_at | deleted | id    | 
compute_node_id | address      | product_id | vendor_id | dev_type | dev_id     
      | label           | status    | extra_info                                
                                                                                
                                                                                
                                                                                
                         | instance_uuid | request_id | numa_node | parent_addr 
 | uuid                                 |
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | 2024-08-08 13:24:38 | 2025-03-24 10:05:24 | NULL       |       0 | 19153 |  
          2782 | 0000:a1:01.5 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_01_5 | label_15b3_101a | available   | {"parent_ifname": 
"ens2f0_14", "capabilities": "{\"sriov\": {\"pf_mac_address\": 
\"3e:0b:a6:3d:08:51\", \"vf_num\": 11}, \"vpd\": {\"card_serial_number\": 
\"IL09FTMY74031167007R\"}}"}
  ```
  
  Nova should be correctly assigning VF capabilities following this patch: 
https://review.opendev.org/c/openstack/nova/+/884439, and in our case the DB 
entry for a VF should like like:
  ```
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | created_at          | updated_at          | deleted_at | deleted | id    | 
compute_node_id | address      | product_id | vendor_id | dev_type | dev_id     
      | label           | status    | extra_info                                
                                                                                
                                                                                
                                                                                
                         | instance_uuid | request_id | numa_node | parent_addr 
 | uuid                                 |
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"}           | NULL 
         | NULL                                 |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
- 
  
  Steps to Reproduce
  ==================
  * create an instance with VF-LAG SRIOV 'direct' NIC
  * restart nova-compute on that hypervisor
  * delete instance
  
  and then the VF in the pci_devices table is left with incomplete
  capabilities
  
  Expected Result
  ===============
  The VF entry in pci_devices should contain the full set of capabilities
  
  DB output
  =========
  This is the expected content of the DB before attach, during, and after
  ```
  Before
  | 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  
  allocated:
  | 2024-08-08 13:25:19 | 2025-03-26 10:40:52 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
0af885b2-921a-4af8-9cec-f227c82e4b86 | 12f68837-b9f9-4993-a242-74c901483440 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  
  instance torn down:
  | 2024-08-08 13:25:19 | 2025-03-26 10:49:06 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  ```
  
  but this is the content of the DB if nova-compute is restarted during the 
lifetime of the instance:
  ```
  Before:
  | 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"}           | NULL 
         | NULL                                 |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  
  Allocated
  | 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  
  Nova compute restarted
  | 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  
  instance torn down:
  | 2024-08-08 13:25:19 | 2025-03-26 10:35:27 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", 
\"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} 
| NULL          | NULL       |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  
  nova compute restarted again:
  | 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  ```
+ 
+ Environment
+ ===========
+ Openstack 2024.1
+ Kolla-Ansible
+ Rocky 9 + KVM
+ Neutron OVS with Mellanox VF-LAG on ConnectX-5

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2104255

Title:
  nova-compute restart stripping VF capabilites on VF unbind

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========

  We started to experience instance build failures relating to port
  binding failure on our 2024.1 system (with VF-LAG), relating to

  ```
  Refusing to bind due to unsupported vnic_type: direct with no switchdev 
capability bind_port
  ```

  and this information was missing from nova's pci_devices table:
  ```
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | created_at          | updated_at          | deleted_at | deleted | id    | 
compute_node_id | address      | product_id | vendor_id | dev_type | dev_id     
      | label           | status    | extra_info                                
                                                                                
                                                                                
                                                                                
                         | instance_uuid | request_id | numa_node | parent_addr 
 | uuid                                 |
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | 2024-08-08 13:24:38 | 2025-03-24 10:05:24 | NULL       |       0 | 19153 |  
          2782 | 0000:a1:01.5 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_01_5 | label_15b3_101a | available   | {"parent_ifname": 
"ens2f0_14", "capabilities": "{\"sriov\": {\"pf_mac_address\": 
\"3e:0b:a6:3d:08:51\", \"vf_num\": 11}, \"vpd\": {\"card_serial_number\": 
\"IL09FTMY74031167007R\"}}"}
  ```

  Nova should be correctly assigning VF capabilities following this patch: 
https://review.opendev.org/c/openstack/nova/+/884439, and in our case the DB 
entry for a VF should like like:
  ```
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | created_at          | updated_at          | deleted_at | deleted | id    | 
compute_node_id | address      | product_id | vendor_id | dev_type | dev_id     
      | label           | status    | extra_info                                
                                                                                
                                                                                
                                                                                
                         | instance_uuid | request_id | numa_node | parent_addr 
 | uuid                                 |
  
+---------------------+---------------------+------------+---------+-------+-----------------+--------------+------------+-----------+----------+------------------+-----------------+-----------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------+------------+-----------+--------------+--------------------------------------+
  | 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"}           | NULL 
         | NULL                                 |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

  Steps to Reproduce
  ==================
  * create an instance with VF-LAG SRIOV 'direct' NIC
  * restart nova-compute on that hypervisor
  * delete instance

  and then the VF in the pci_devices table is left with incomplete
  capabilities

  Expected Result
  ===============
  The VF entry in pci_devices should contain the full set of capabilities

  DB output
  =========
  This is the expected content of the DB before attach, during, and after
  ```
  Before
  | 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

  allocated:
  | 2024-08-08 13:25:19 | 2025-03-26 10:40:52 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
0af885b2-921a-4af8-9cec-f227c82e4b86 | 12f68837-b9f9-4993-a242-74c901483440 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

  instance torn down:
  | 2024-08-08 13:25:19 | 2025-03-26 10:49:06 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  ```

  but this is the content of the DB if nova-compute is restarted during the 
lifetime of the instance:
  ```
  Before:
  | 2024-08-08 13:25:19 | 2025-03-26 10:05:31 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"}           | NULL 
         | NULL                                 |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

  Allocated
  | 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

  Nova compute restarted
  | 2024-08-08 13:25:19 | 2025-03-26 10:13:17 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | allocated | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | 
f05aa8e2-269d-4c45-ad4a-2a711b71fbed | e8c2a25b-9637-4935-ad09-cfca34f7e919 |   
      6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

  instance torn down:
  | 2024-08-08 13:25:19 | 2025-03-26 10:35:27 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", 
\"vf_num\": 30}, \"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} 
| NULL          | NULL       |         6 | 0000:a1:00.0 | 
b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |

  nova compute restarted again:
  | 2024-08-08 13:25:19 | 2025-03-26 10:37:10 | NULL       |       0 | 19768 |  
          2773 | 0000:a1:04.0 | 101a       | 15b3      | type-VF  | 
pci_0000_a1_04_0 | label_15b3_101a | available | {"parent_ifname": "ens2f0_18", 
"capabilities": "{\"network\": [\"rx\", \"tx\", \"sg\", \"tso\", \"gso\", 
\"gro\", \"rxvlan\", \"txvlan\", \"rxhash\", \"rdma\", \"switchdev\"], 
\"sriov\": {\"pf_mac_address\": \"42:63:6a:66:3d:a7\", \"vf_num\": 30}, 
\"vpd\": {\"card_serial_number\": \"IL09FTMY7403112G002Y\"}}"} | NULL          
| NULL       |         6 | 0000:a1:00.0 | b6995f86-aa97-4c1f-a09a-ce9a421c1d9a |
  ```

  Environment
  ===========
  Openstack 2024.1
  Kolla-Ansible
  Rocky 9 + KVM
  Neutron OVS with Mellanox VF-LAG on ConnectX-5

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2104255/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to