Thanks for flagging the potential security impact of this. Can someone provide a succinct exploit scenario for how an attacker might cause this to occur and then take advantage of it? Or is it merely one of those situations where someone could take advantage of the issue if they happen to find an environment where the necessary conditions were already met?
** Also affects: ossa Importance: Undecided Status: New ** Changed in: ossa Status: New => Incomplete -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to neutron. https://bugs.launchpad.net/bugs/2048785 Title: Trunk parent port (tpt port) vlan_mode is wrong in ovs Status in neutron: In Progress Status in OpenStack Security Advisory: Incomplete Bug description: ... therefore a forwarding loop, packet duplication, packet loss and double tagging is possible. Today a trunk bridge with one parent and one subport looks like this: # ovs-vsctl show ... Bridge tbr-b2781877-3 datapath_type: system Port spt-28c9689e-9e tag: 101 Interface spt-28c9689e-9e type: patch options: {peer=spi-28c9689e-9e} Port tap3709f1a1-a5 Interface tap3709f1a1-a5 Port tpt-3709f1a1-a5 Interface tpt-3709f1a1-a5 type: patch options: {peer=tpi-3709f1a1-a5} Port tbr-b2781877-3 Interface tbr-b2781877-3 type: internal ... # ovs-vsctl find Port name=tpt-3709f1a1-a5 | egrep 'tag|vlan_mode|trunks' tag : [] trunks : [] vlan_mode : [] # ovs-vsctl find Port name=spt-28c9689e-9e | egrep 'tag|vlan_mode|trunks' tag : 101 trunks : [] vlan_mode : [] I believe the vlan_mode of the tpt port is wrong (at least when the port is not "vlan_transparent") and it should have the value "access". Even when the port is "vlan_transparent", forwarding loops between br-int and a trunk bridge should be prevented. According to: http://www.openvswitch.org/support/dist-docs/ovs- vswitchd.conf.db.5.txt """ vlan_mode: optional string, one of access, dot1q-tunnel, native-tagged, native-untagged, or trunk The VLAN mode of the port, as described above. When this column is empty, a default mode is selected as follows: • If tag contains a value, the port is an access port. The trunks column should be empty. • Otherwise, the port is a trunk port. The trunks column value is honored if it is present. """ """ trunks: set of up to 4,096 integers, in range 0 to 4,095 For a trunk, native-tagged, or native-untagged port, the 802.1Q VLAN or VLANs that this port trunks; if it is empty, then the port trunks all VLANs. Must be empty if this is an access port. A native-tagged or native-untagged port always trunks its native VLAN, regardless of whether trunks includes that VLAN. """ The above combination of tag, trunks and vlan_mode for the tpt port means that it is in trunk mode (in the ovs sense) and it forwards both untagged and tagged frames with any vlan tag. But the tpt port should only forward untagged frames. Feel free to treat this as the end of the bug report. But below I'll add more about how we found this bug, in what conditions can it be triggered, what consequences it may have. However please keep in mind I don't have a full upstream reproduction at the moment. Nor have I a full analysis of every suspicion mentioned below. I'm aware of a full reproduction of this bug only in a downstream environment, which looked like below. While the following was sufficient to reproduce the problem, this was likely far from a minimal reproduction and some/many of the below steps are unnecessary. * [securitygroup].firewall_driver = openvswitch ((edited, originally was: noop)) * [agent].explicitly_egress_direct = True ((edited, originally was: [ovs].explicitly_egress_direct = True)) * 2 VMs started on the same compute. * Both having a trunk port with one parent and one subport. * The parent and the subport of each trunk have the same MAC address. * All ports are on vlan networks belonging to the same physnet. * All ports are created with --disable-port-security and --no-security-group. * The subport segmentation IDs and the corresponding vlan network segmentation IDs were the same (as if they used "inherit"). * Traffic was generated from a 3rd VM on a different compute addressed to one of the VM's subport IP, for which * the destination MAC was not yet learned by either br-int or the two trunk bridges on the host. I believe the environment looked like this: openstack network create net0 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 100 openstack network create net1 --provider-network-type vlan --provider-physical-network physnet0 --provider-segment 101 openstack subnet create --network net0 --subnet-range 10.0.100.0/24 subnet0 openstack subnet create --network net1 --subnet-range 10.0.101.0/24 subnet1 openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.10 port0a port0a_mac="$( openstack port show port0a -f value -c mac_address )" openstack port create --no-security-group --disable-port-security --mac-address "$port0a_mac" --network net1 --fixed-ip ip-address=10.0.101.10 port1a openstack port create --no-security-group --disable-port-security --network net0 --fixed-ip ip-address=10.0.100.11 port0b port0b_mac="$( openstack port show port0b -f value -c mac_address )" openstack port create --no-security-group --disable-port-security --mac-address "$port0b_mac" --network net1 --fixed-ip ip-address=10.0.101.11 port1b openstack network trunk create --parent-port port0a trunka openstack network trunk set --subport port=port1a,segmentation-type=vlan,segmentation-id=101 trunka openstack network trunk create --parent-port port0b trunkb openstack network trunk set --subport port=port1b,segmentation-type=vlan,segmentation-id=101 trunkb openstack server create --flavor ds1G --image u1804 --nic port-id=port0a --wait vma openstack server create --flavor ds1G --image u1804 --nic port-id=port0b --wait vmb # booted on the same compute as vma At the moment I don't have a reproduction independent of that environment, that re-creates the same state of the bridges' FDBs and the same kind of traffic. Anyway, in this environment colleagues observed: * Lost frames. * Duplicated frames arriving to the vNIC of one of the VMs. * Unexpectedly double tagged frames on the physical bridge leaving the compute host. Local analysis showed as the traffic arrived to br-int, which did not have the dst MAC in its FDB, had to flood to all ports. This way the frame ended up on both trunk bridges. One of these trunk bridges was on the proper way to the destination address. But the other trunk bridge, also not having the dst MAC in its FDB, had to flood to all ports. And this trunk bridge also flooded the frame to its tpt port back to br-int. But the tpt port conceptually is in a different VLAN and the frame should never have been flooded to that port. However the tpt port has the wrong configuration and forwards the traffic from the wrong VLANs. After the looped frame got back to br-int, it reached the intended VMs vNIC via the trunk parent (sic!) port. Which means that the latter trunk bridge learned the traffic generator's source MAC now on the wrong port. I have a suspicion that this may have lead to the unexpectedly double tagged packets in the other direction. To manage notifications about this bug go to: https://bugs.launchpad.net/neutron/+bug/2048785/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp