On Tue, Dec 15, 2020 at 06:43:12PM -0500, Daniel Jakots wrote: > On Tue, 15 Dec 2020 14:30:16 +1000, David Gwynne <da...@gwynne.id.au> > wrote: > > > Can you try tcpdump -p -veni em0 -D in and see if any LACP packets > > appear to come in on the port? If not, can you remove the -p and see > > if em0 starts to work? > > > > There are two main differences between how aggr(4) and trunk(4) > > works. The first you've already found, which is that trunk(4) uses > > the address from one of the ports it's given, while aggr(4) generates > > one when it's created. The second difference is that trunk(4) makes > > member ports promisc, while aggr(4) tries to be a lot more precise > > and takes care to program the ports properly. This means that in your > > environment em(4) has to support changing it's MAC address to the one > > provided by aggr(4), and it has to support joining multicast groups > > properly, including the one that LACP packets are sent to. > > > > tcpdump with -p means that it won't make the interface promiscuous. > > If you don't see LACP packets come in while the port is promisc, that > > means the multicast filter isn't working properly. It should start > > working if you're running tcpdump without -p on the em(4) ports, or > > on aggr(4) itself. > > > Thanks for your reply! > > Here's what I did (spoiler alert, I couldn't get aggr0 to work): > > I switched back the hostname files, and rebooted. > > During boot: > > starting network > aggr0 em0 trunkport: creating port > aggr0 em0 mux: BEGIN (BEGIN) -> DETACHED > aggr0 em0 rxm: BEGIN (BEGIN) -> INITIALIZE > aggr0 em0 rxm: INITIALIZE (UCT) -> PORT_DISABLED > aggr0 em1 trunkport: creating port > aggr0 em1 mux: BEGIN (BEGIN) -> DETACHED > aggr0 em1 rxm: BEGIN (BEGIN) -> INITIALIZE > aggr0 em1 rxm: INITIALIZE (UCT) -> PORT_DISABLED > aggr0 em2 trunkport: creating port > aggr0 em2 mux: BEGIN (BEGIN) -> DETACHED > aggr0 em2 rxm: BEGIN (BEGIN) -> INITIALIZE > aggr0 em2 rxm: INITIALIZE (UCT) -> PORT_DISABLED > vlan10: no link....aggr0 em0 rxm: PORT_DISABLED (port_enabled) -> > EXPIRED .aggr0 em2 rxm: PORT_DISABLED (port_enabled) -> EXPIRED > aggr0 em1 rxm: PORT_DISABLED (port_enabled) -> EXPIRED > ..aggr0 em0 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED > aggr0 em2 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED > aggr0 em1 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED > ... sleeping > > root@pancake:~# tcpdump -p -veni em0 -D in > tcpdump: listening on em0, link-type EN10MB > 18:04:03.996369 80:56:f2:b7:9c:09 ff:ff:ff:ff:ff:ff 8100 60: 802.1Q vid 70 > pri 1 arp who-has 10.70.70.254 tell 10.70.70.101 > 18:04:04.016123 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 > pri 1 arp who-has 24.48.69.20 tell 24.48.69.1 > 18:04:04.034874 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 > pri 1 arp who-has 24.48.69.109 tell 24.48.69.1 > > (vlan10 is my uplink to my isp's modem), I didn't have anything but > those arp who-has. > > root@pancake:~# ifconfig aggr0 -> still no carrier > > root@pancake:~# tcpdump -veni em0 -D in > tcpdump: listening on em0, link-type EN10MB > 18:05:11.247455 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1423: 802.1Q vid 20 > pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1377 (ttl 64, id 2495, len > 1405) > 18:05:11.248427 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1390: 802.1Q vid 20 > pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1344 (ttl 64, id 47470, > len 1372) > 18:05:11.249478 52:54:00:06:aa:01 00:0d:b9:43:9f:fc 8100 1424: 802.1Q vid 20 > pri 1 10.10.10.44.5638 > 198.48.202.251.25826: udp 1378 (ttl 64, id 57431, > len 1406) > 18:05:11.570690 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 > pri 1 arp who-has 184.161.78.225 tell 184.161.78.1 > 18:05:11.586920 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 > pri 1 arp who-has 192.222.131.28 tell 192.222.131.1 > 18:05:12.050180 00:17:10:8e:44:a5 ff:ff:ff:ff:ff:ff 8100 64: 802.1Q vid 10 > pri 1 arp who-has 24.48.76.202 tell 24.48.76.1 > > nothing else than those udp packets (my collectd setup) and the > arp who-has > > root@pancake:~# ifconfig aggr0 -> still no carrier > > At that point I thought "sthen asked me to try to reboot the switch, > let's do it now" and shortly after I got in my console > aggr0 em0 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED > aggr0 em1 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED > aggr0 em2 rxm: DEFAULTED (!port_enabled) -> PORT_DISABLED > aggr0 em2 rxm: PORT_DISABLED (port_enabled) -> EXPIRED > aggr0 em1 rxm: PORT_DISABLED (port_enabled) -> EXPIRED > aggr0 em0 rxm: PORT_DISABLED (port_enabled) -> EXPIRED > aggr0 em2 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED > aggr0 em1 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED > aggr0 em0 rxm: EXPIRED (current_while_timer expired) -> DEFAULTED > > I tried again putting in promiscuous mode. I thought also let's do it > on all physical interface as well to be safe :D > > # tcpdump -veni aggr0 -D in > # tcpdump -veni em0 -D in > # tcpdump -veni em1 -D in > # tcpdump -veni em2 -D in > > root@pancake:~# ifconfig aggr0 -> still no carrier
By default LACP only sends packets every 30 seconds. Did you run tcpdump for long enough to make sure you saw at least one? If you get rid of "-D in" do you see the LACP packets that OpenBSD is transmitting? my conclusion so far is that your switch isn't actually doing LACP. I was going to say that you shouldn't see those ARP packets come in until the switch also thinks that LACP comes up, but some switches have an "independent" port mode where if LACP isn't working they let the member ports operate individually. This is mostly to support being able to PXE boot a machine before it comes up and starts negotiating LACP. It's possible this is the state your switch thinks it's in. Alternatively your switch is configured with a static aggregation, ie, what the "loadbalance" in trunk(4) does. dlg