Public bug reported:

We have seen tpi- and spi- interfaces in ovs not deleted by ovs-agent
when they should have been deleted already.

At the moment I only have a reproduction based on chance with wildly
varying frequency of the error symptoms:

ovs-dump() {
    for bridge in $( sudo ovs-vsctl list-br )
    do
        for port in $( sudo ovs-vsctl list-ports $bridge )
        do
            echo $bridge $port
        done
    done | sort
}

ovs-dump > ovs-state.0

for j in $( seq 1 10 )
do
    openstack network create tnet0
    openstack subnet create --network tnet0 --subnet-range 10.0.100.0/24 
tsubnet0
    openstack port create --network tnet0 tport0
    openstack network trunk create --parent-port tport0 trunk0
    tport0_mac="$( openstack port show tport0 -f value -c mac_address )"

    for i in $( seq 1 30 )
    do
        openstack network create tnet$i
        openstack subnet create --network tnet$i --subnet-range 10.0.$(( 100 + 
$i )).0/24 tsubnet$i
        openstack port create --network tnet$i --mac-address "$tport0_mac" 
tport$i
        openstack network trunk set --subport 
port=tport$i,segmentation-type=vlan,segmentation-id=$(( 100 + $i )) trunk0
    done

    openstack server create --flavor cirros256 --image cirros-0.6.3-x86_64-disk 
--nic port-id=tport0 tvm0 --wait
    # Theoretically not needed, but still make sure we don't interrupt anything 
work in progress to make the repro more uniform.
    while [ "$( openstack network trunk show trunk0 -f value -c status )" != 
"ACTIVE" ]
    do
        sleep 1
    done

    openstack server delete tvm0 --wait
    openstack network trunk delete trunk0
    openstack port list -f value -c ID -c Name | awk '/tport/ { print $1 }' | 
xargs -r openstack port delete
    openstack net list -f value -c ID -c Name | awk '/tnet/ { print $1 }' | 
xargs -r openstack net delete
done

sleep 10
ovs-dump > ovs-state.1

diff -u ovs-state.{0,1}

One example output with j=1..20 and i=1..30:

--- ovs-state.0 2025-01-16 13:31:07.881407421 +0000
+++ ovs-state.1 2025-01-16 14:52:45.323392243 +0000
@@ -8,9 +8,27 @@
 br-int qr-88029aef-01
 br-int sg-73e24638-69
 br-int sg-e45cf925-de
+br-int spi-1eeb4ae6-1b
+br-int spi-2093a8c2-df
+br-int spi-2d9ae883-d9
+br-int spi-3f17d563-cd
+br-int spi-9c0d9c98-d8
+br-int spi-a2dc4baf-ef
+br-int spi-af2efafa-39
+br-int spi-c14e8bc3-62
+br-int spi-c16959f8-da
+br-int spi-e90d4d84-31
 br-int tap03961474-06
 br-int tap3e6a6311-95
 br-int tpi-1f8b5666-bf
+br-int tpi-2477b06f-5d
+br-int tpi-4421d69a-be
+br-int tpi-572a3af8-42
 br-int tpi-9cf24ba1-ba
+br-int tpi-9e60cb66-5e
+br-int tpi-a533a27b-78
+br-int tpi-cddcaa7b-15
+br-int tpi-d7cd2e3e-e6
+br-int tpi-e68ca29d-4d
 br-physnet0 phy-br-physnet0
 br-tun patch-int

These ports are not even cleaned up by an ovs-agent restart. During the
runs I have not found ERROR messages in ovs-agent logs.

The amount of ports left behind varies wildly. I have seen cases when
more than 50% of vm start/deletes left behind one tpi port. But I have
also seen cases when I had to have ten runs (j=1..10) to see the first
leftover interface. This makes me believe there's a causal factor
present here (probably timing based) I don't understand and cannot
control yet.

I want to get back to analyse the root cause, however I hope that first
I can find a quicker and more reliable reproduction method so it becomes
easier to work with this.

devstack 2f3440dc
neutron 8cca47f2e7

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: ovs trunk

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2095152

Title:
  ovs-agent: Leftover tpi/spi interfaces after VM boot/delete with trunk
  port(s)

Status in neutron:
  New

Bug description:
  We have seen tpi- and spi- interfaces in ovs not deleted by ovs-agent
  when they should have been deleted already.

  At the moment I only have a reproduction based on chance with wildly
  varying frequency of the error symptoms:

  ovs-dump() {
      for bridge in $( sudo ovs-vsctl list-br )
      do
          for port in $( sudo ovs-vsctl list-ports $bridge )
          do
              echo $bridge $port
          done
      done | sort
  }

  ovs-dump > ovs-state.0

  for j in $( seq 1 10 )
  do
      openstack network create tnet0
      openstack subnet create --network tnet0 --subnet-range 10.0.100.0/24 
tsubnet0
      openstack port create --network tnet0 tport0
      openstack network trunk create --parent-port tport0 trunk0
      tport0_mac="$( openstack port show tport0 -f value -c mac_address )"

      for i in $( seq 1 30 )
      do
          openstack network create tnet$i
          openstack subnet create --network tnet$i --subnet-range 10.0.$(( 100 
+ $i )).0/24 tsubnet$i
          openstack port create --network tnet$i --mac-address "$tport0_mac" 
tport$i
          openstack network trunk set --subport 
port=tport$i,segmentation-type=vlan,segmentation-id=$(( 100 + $i )) trunk0
      done

      openstack server create --flavor cirros256 --image 
cirros-0.6.3-x86_64-disk --nic port-id=tport0 tvm0 --wait
      # Theoretically not needed, but still make sure we don't interrupt 
anything work in progress to make the repro more uniform.
      while [ "$( openstack network trunk show trunk0 -f value -c status )" != 
"ACTIVE" ]
      do
          sleep 1
      done

      openstack server delete tvm0 --wait
      openstack network trunk delete trunk0
      openstack port list -f value -c ID -c Name | awk '/tport/ { print $1 }' | 
xargs -r openstack port delete
      openstack net list -f value -c ID -c Name | awk '/tnet/ { print $1 }' | 
xargs -r openstack net delete
  done

  sleep 10
  ovs-dump > ovs-state.1

  diff -u ovs-state.{0,1}

  One example output with j=1..20 and i=1..30:

  --- ovs-state.0 2025-01-16 13:31:07.881407421 +0000
  +++ ovs-state.1 2025-01-16 14:52:45.323392243 +0000
  @@ -8,9 +8,27 @@
   br-int qr-88029aef-01
   br-int sg-73e24638-69
   br-int sg-e45cf925-de
  +br-int spi-1eeb4ae6-1b
  +br-int spi-2093a8c2-df
  +br-int spi-2d9ae883-d9
  +br-int spi-3f17d563-cd
  +br-int spi-9c0d9c98-d8
  +br-int spi-a2dc4baf-ef
  +br-int spi-af2efafa-39
  +br-int spi-c14e8bc3-62
  +br-int spi-c16959f8-da
  +br-int spi-e90d4d84-31
   br-int tap03961474-06
   br-int tap3e6a6311-95
   br-int tpi-1f8b5666-bf
  +br-int tpi-2477b06f-5d
  +br-int tpi-4421d69a-be
  +br-int tpi-572a3af8-42
   br-int tpi-9cf24ba1-ba
  +br-int tpi-9e60cb66-5e
  +br-int tpi-a533a27b-78
  +br-int tpi-cddcaa7b-15
  +br-int tpi-d7cd2e3e-e6
  +br-int tpi-e68ca29d-4d
   br-physnet0 phy-br-physnet0
   br-tun patch-int

  These ports are not even cleaned up by an ovs-agent restart. During
  the runs I have not found ERROR messages in ovs-agent logs.

  The amount of ports left behind varies wildly. I have seen cases when
  more than 50% of vm start/deletes left behind one tpi port. But I have
  also seen cases when I had to have ten runs (j=1..10) to see the first
  leftover interface. This makes me believe there's a causal factor
  present here (probably timing based) I don't understand and cannot
  control yet.

  I want to get back to analyse the root cause, however I hope that
  first I can find a quicker and more reliable reproduction method so it
  becomes easier to work with this.

  devstack 2f3440dc
  neutron 8cca47f2e7

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2095152/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

Reply via email to