On 12/05/2017 3:31 PM, Kyle Larose wrote:
I'm adding the dev mailing list/link bonding maintainer, because I've done some
more investigation and I'm beginning to think something is wrong.
-----Original Message-----
From: Kyle Larose
Sent: Thursday, May 11, 2017 4:55 PM
To: us...@dpdk.org
Subject: active_backup link bonding and mac address
Hey fellow DPDK users,
I have a question about the link bond pmd.
I am running 4 X710 interfaces in a link bond pmd for my application. In
LACP mode, everything works fine. But, in active_backup mode, if the primary
link fails, my application stops working. The reason is that I'm still
sending packets with the original MAC address of the link bond pmd, which is
that of the original primary slave. However, the new primary is not in
promiscuous mode, so traffic coming back with that MAC address drops.
What should I be doing here:
1) Should I be listening for the changes in the state of the primary, and
updating the MAC address I use to send? (I have it cached for efficiency)
2) Should the driver be placing the interface into promiscuous mode to allow
for this, similar to what LACP does?
3) Should the driver be overwriting the MAC on egress, similar to what the
tlb driver seems to do (in bond_ethdev_tx_burst_tlb)
I'm fine with #1, but it seems to break the goal of having the link bond pmd
be transparent to the application.
I checked the mac address of the link bond interface after the failover, and it
did not change.
It still had the MAC address of the first slave that was added. This seems
incompatible with
solution number 1 that I suggested above, which means either it the link bond
device should
update its address, or it should be promiscuous at the slave level.
FWIW, I'm using 16.07. I have reproduced this on testpmd by looking at port
state. (with some
fiddling -- needed to prevent it from starting the slave interfaces, and turn
off its default
promiscuous mode.)
Does anyone have any input on this problem?
Thanks,
Kyle
Kyle, sorry I didn't see the post in the users list. I think the issue
is that the new primary is missing the bond MAC address on it's valid
MACs list, hence it is dropping the ingress packets after a fail-over
event, placing the all the slave devices into promiscuous mode as you
suggest in option 2 would probably make the issue go away but I don't
think it's the correct solution. I think we should just be adding the
bond MAC to each slaves devices valid MAC list. As only one bond slave
is only active at any time this won't cause any issues to the network,
and will mean that fail over is transparent to your application and
there is no need for MAC rewrites, which would invalidate existing ARP
table entries at downstream end points.