Public bug reported: We have seen tpi- and spi- interfaces in ovs not deleted by ovs-agent when they should have been deleted already.
At the moment I only have a reproduction based on chance with wildly varying frequency of the error symptoms: ovs-dump() { for bridge in $( sudo ovs-vsctl list-br ) do for port in $( sudo ovs-vsctl list-ports $bridge ) do echo $bridge $port done done | sort } ovs-dump > ovs-state.0 for j in $( seq 1 10 ) do openstack network create tnet0 openstack subnet create --network tnet0 --subnet-range 10.0.100.0/24 tsubnet0 openstack port create --network tnet0 tport0 openstack network trunk create --parent-port tport0 trunk0 tport0_mac="$( openstack port show tport0 -f value -c mac_address )" for i in $( seq 1 30 ) do openstack network create tnet$i openstack subnet create --network tnet$i --subnet-range 10.0.$(( 100 + $i )).0/24 tsubnet$i openstack port create --network tnet$i --mac-address "$tport0_mac" tport$i openstack network trunk set --subport port=tport$i,segmentation-type=vlan,segmentation-id=$(( 100 + $i )) trunk0 done openstack server create --flavor cirros256 --image cirros-0.6.3-x86_64-disk --nic port-id=tport0 tvm0 --wait # Theoretically not needed, but still make sure we don't interrupt anything work in progress to make the repro more uniform. while [ "$( openstack network trunk show trunk0 -f value -c status )" != "ACTIVE" ] do sleep 1 done openstack server delete tvm0 --wait openstack network trunk delete trunk0 openstack port list -f value -c ID -c Name | awk '/tport/ { print $1 }' | xargs -r openstack port delete openstack net list -f value -c ID -c Name | awk '/tnet/ { print $1 }' | xargs -r openstack net delete done sleep 10 ovs-dump > ovs-state.1 diff -u ovs-state.{0,1} One example output with j=1..20 and i=1..30: --- ovs-state.0 2025-01-16 13:31:07.881407421 +0000 +++ ovs-state.1 2025-01-16 14:52:45.323392243 +0000 @@ -8,9 +8,27 @@ br-int qr-88029aef-01 br-int sg-73e24638-69 br-int sg-e45cf925-de +br-int spi-1eeb4ae6-1b +br-int spi-2093a8c2-df +br-int spi-2d9ae883-d9 +br-int spi-3f17d563-cd +br-int spi-9c0d9c98-d8 +br-int spi-a2dc4baf-ef +br-int spi-af2efafa-39 +br-int spi-c14e8bc3-62 +br-int spi-c16959f8-da +br-int spi-e90d4d84-31 br-int tap03961474-06 br-int tap3e6a6311-95 br-int tpi-1f8b5666-bf +br-int tpi-2477b06f-5d +br-int tpi-4421d69a-be +br-int tpi-572a3af8-42 br-int tpi-9cf24ba1-ba +br-int tpi-9e60cb66-5e +br-int tpi-a533a27b-78 +br-int tpi-cddcaa7b-15 +br-int tpi-d7cd2e3e-e6 +br-int tpi-e68ca29d-4d br-physnet0 phy-br-physnet0 br-tun patch-int These ports are not even cleaned up by an ovs-agent restart. During the runs I have not found ERROR messages in ovs-agent logs. The amount of ports left behind varies wildly. I have seen cases when more than 50% of vm start/deletes left behind one tpi port. But I have also seen cases when I had to have ten runs (j=1..10) to see the first leftover interface. This makes me believe there's a causal factor present here (probably timing based) I don't understand and cannot control yet. I want to get back to analyse the root cause, however I hope that first I can find a quicker and more reliable reproduction method so it becomes easier to work with this. devstack 2f3440dc neutron 8cca47f2e7 ** Affects: neutron Importance: Undecided Status: New ** Tags: ovs trunk -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2095152 Title: ovs-agent: Leftover tpi/spi interfaces after VM boot/delete with trunk port(s) Status in neutron: New Bug description: We have seen tpi- and spi- interfaces in ovs not deleted by ovs-agent when they should have been deleted already. At the moment I only have a reproduction based on chance with wildly varying frequency of the error symptoms: ovs-dump() { for bridge in $( sudo ovs-vsctl list-br ) do for port in $( sudo ovs-vsctl list-ports $bridge ) do echo $bridge $port done done | sort } ovs-dump > ovs-state.0 for j in $( seq 1 10 ) do openstack network create tnet0 openstack subnet create --network tnet0 --subnet-range 10.0.100.0/24 tsubnet0 openstack port create --network tnet0 tport0 openstack network trunk create --parent-port tport0 trunk0 tport0_mac="$( openstack port show tport0 -f value -c mac_address )" for i in $( seq 1 30 ) do openstack network create tnet$i openstack subnet create --network tnet$i --subnet-range 10.0.$(( 100 + $i )).0/24 tsubnet$i openstack port create --network tnet$i --mac-address "$tport0_mac" tport$i openstack network trunk set --subport port=tport$i,segmentation-type=vlan,segmentation-id=$(( 100 + $i )) trunk0 done openstack server create --flavor cirros256 --image cirros-0.6.3-x86_64-disk --nic port-id=tport0 tvm0 --wait # Theoretically not needed, but still make sure we don't interrupt anything work in progress to make the repro more uniform. while [ "$( openstack network trunk show trunk0 -f value -c status )" != "ACTIVE" ] do sleep 1 done openstack server delete tvm0 --wait openstack network trunk delete trunk0 openstack port list -f value -c ID -c Name | awk '/tport/ { print $1 }' | xargs -r openstack port delete openstack net list -f value -c ID -c Name | awk '/tnet/ { print $1 }' | xargs -r openstack net delete done sleep 10 ovs-dump > ovs-state.1 diff -u ovs-state.{0,1} One example output with j=1..20 and i=1..30: --- ovs-state.0 2025-01-16 13:31:07.881407421 +0000 +++ ovs-state.1 2025-01-16 14:52:45.323392243 +0000 @@ -8,9 +8,27 @@ br-int qr-88029aef-01 br-int sg-73e24638-69 br-int sg-e45cf925-de +br-int spi-1eeb4ae6-1b +br-int spi-2093a8c2-df +br-int spi-2d9ae883-d9 +br-int spi-3f17d563-cd +br-int spi-9c0d9c98-d8 +br-int spi-a2dc4baf-ef +br-int spi-af2efafa-39 +br-int spi-c14e8bc3-62 +br-int spi-c16959f8-da +br-int spi-e90d4d84-31 br-int tap03961474-06 br-int tap3e6a6311-95 br-int tpi-1f8b5666-bf +br-int tpi-2477b06f-5d +br-int tpi-4421d69a-be +br-int tpi-572a3af8-42 br-int tpi-9cf24ba1-ba +br-int tpi-9e60cb66-5e +br-int tpi-a533a27b-78 +br-int tpi-cddcaa7b-15 +br-int tpi-d7cd2e3e-e6 +br-int tpi-e68ca29d-4d br-physnet0 phy-br-physnet0 br-tun patch-int These ports are not even cleaned up by an ovs-agent restart. During the runs I have not found ERROR messages in ovs-agent logs. The amount of ports left behind varies wildly. I have seen cases when more than 50% of vm start/deletes left behind one tpi port. But I have also seen cases when I had to have ten runs (j=1..10) to see the first leftover interface. This makes me believe there's a causal factor present here (probably timing based) I don't understand and cannot control yet. I want to get back to analyse the root cause, however I hope that first I can find a quicker and more reliable reproduction method so it becomes easier to work with this. devstack 2f3440dc neutron 8cca47f2e7 To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2095152/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp