Hello all, I'm looking for some guidance in chasing what I believe to be a bug in kernel traffic control filters. If I'm pinging the wrong list let me know.
I have a homebrew MACSec bridge setup using two pairs of PCs. I establish a MACSec link between them, and then use TC to bridge a second ethernet interface over the MACSec link. The second interface is connected to a Juniper switch at each end, and I'm using LACP over the links to bond them up for redundancy. It turns out I need that redundancy as after awhile one pair of bridges will stop flowing packets in one direction. I've since replicated this failure with a group of VMs as well. My test setup to replicate the failure inside ESXi: - Two MACSec bridge VMs, A and Z - Two IPerf VMs, A and Z My VMs are currently built using Ubuntu Server 18.04 to be quick, no additional packages are required outside of iperf3. Kernel ver as shipped currently is 4.15.0-36. I highly advise using a CPU with AES instruction support as MACSec eats CPU without it and will take longer to reproduce the symptoms. - A 'MACSec Bridge' network - A 'A Side link' network - A 'Z Side link' network In ESXi I used a dedicated vSwitch, 9000 MTU (to allow full 1500 eth packets + MACSec to pass on the bridge) and the security policy is full open (allow promiscuous, allow forged, allow mac changes) as we're abusing the networks as direct point to point links. If using physical machines, just cable up, my example script bumps the MTU as required. The MACSec boxes have two ethernet interfaces each. One pair is on the MACSec Bridge network. The other interfaces go to the A and Z IPerf boxes respectively via their dedicated networks. A and Z need their interfaces configured with IPs in a common subnet, such as 192.168.0.1/30 and 192.168.0.2/30. My script sets up MACSec, tweaks MTUs, and touches a few sysctls to turn the involved interfaces into silent actors. It then uses TC to start the actual bridging. From there I've been firing up iperf 3 sessions in both directions between A and Z to hammer the bridge until it fails. When it does, I can see packets stop being bridged in one direction on one MACSec host, but not the other. The second host continues to flow packets in both directions. Nothing is logged to dmesg when this fault occurs. The fault seems to occur at roughly the same packet / traffic amount each time. On my main application it's after approximately 2.5TB of traffic (random mix of sizes) and with my test bed it was after 5.5TB of 1500 byte packets. On the impacted MACSec node, watching interface packet counters via ifconfig and actual traffic with tcpdump I can see packets coming in MACSec and going out the host interface, the host reply coming in but not showing up on the MACSec interface to cross the bridge. Clearing out the tc filter and qdisc and re-adding does not restore traffic flow. There is a PPA with 4.18 available for Ubuntu that I'm going to test with next to see if that makes a difference in behavior. In the mean time I'd appreciate any suggestions on how to diagnose this. My MACSec bridge setup script, update sif, dif, the keys and rxmac to match your setup. The rxmac is the mac addy of the remote bridge interface. Keys need to be flipped between systems. ----------------------- #!/bin/bash # Interfaces: # sif = Ingress physical interface (Source) # dif = Egress physical interface (Dest) # eif = Encrypted interface sif=eno2 dif=enp1s0f0 eif=macsec0 # MACSec Keys: # txkey = Transmit (Local) key # rxkey = Receive (Remote) key # rxmac = Receive (Remote) MAC addy txkey=00000000000000000000000000000000 rxkey=99999999999999999999999999999999 rxmac=00:11:22:33:44:55 # Use jumbo frames for macsec to allow full 1500 MTU passthrough: echo "* MTU update" ip link set "$sif" mtu 9000 ip link set "$dif" mtu 9000 # Bring up macsec: echo "* Enable MACSec" modprobe macsec ip link add link "$dif" "$eif" type macsec ip macsec add "$eif" tx sa 0 pn 1 on key 02 "$txkey" ip macsec add "$eif" rx address "$rxmac" port 1 ip macsec add "$eif" rx address "$rxmac" port 1 sa 0 pn 1 on key 01 "$rxkey" ip link set "$eif" type macsec encrypt on #ip link set "$eif" type macsec replay on window 64 # Keep system from trying to respond to observed traffic: echo "* Clamp the system so bridge ports NEVER respond to traffic" sysctl -w net.ipv4.conf.default.arp_filter=1 sysctl -w net.ipv4.conf.all.arp_filter=1 ip link set "$sif" down promisc on arp off multicast off sysctl -w net.ipv6.conf."$sif".autoconf=0 sysctl -w net.ipv6.conf."$sif".accept_ra=0 sysctl -w net.ipv4.conf."$sif".arp_ignore=8 sysctl -w net.ipv4.conf."$sif".rp_filter=0 ip link set "$dif" down promisc on arp off multicast off sysctl -w net.ipv6.conf."$dif".autoconf=0 sysctl -w net.ipv6.conf."$dif".accept_ra=0 sysctl -w net.ipv4.conf."$dif".arp_ignore=8 sysctl -w net.ipv4.conf."$dif".rp_filter=0 ip link set "$eif" down promisc on arp off multicast off sysctl -w net.ipv6.conf."$eif".autoconf=0 sysctl -w net.ipv6.conf."$eif".accept_ra=0 sysctl -w net.ipv4.conf."$eif".arp_ignore=8 sysctl -w net.ipv4.conf."$eif".rp_filter=0 # Set up traffic mirroring: echo "* Start Port Mirror" # sif to eif tc qdisc add dev "$sif" ingress tc filter add dev "$sif" parent ffff: \ protocol all \ u32 match u8 0 0 \ action mirred egress mirror dev "$eif" # eif to sif tc qdisc add dev "$eif" ingress tc filter add dev "$eif" parent ffff: \ protocol all \ u32 match u8 0 0 \ action mirred egress mirror dev "$sif" # Bring up the interfaces: echo "* Light tunnel NICS" ip link set "$sif" up ip link set "$dif" up ip link set "$eif" up echo " --=[ MACSec Up ]=--" ----------------------- Josh Coombs