On 02/04/2016 01:57 PM, Jay Vosburgh wrote:
Tantilov, Emil S <emil.s.tanti...@intel.com> wrote:
We are seeing an occasional issue where the bonding driver may report interface
up with 0 Mbps:
bond0: link status definitely up for interface eth0, 0 Mbps full duplex
So far in all the failed traces I have collected this happens on
NETDEV_CHANGELOWERSTATE event:
<...>-20533 [000] .... 81811.041241: ixgbe_service_task: eth1: NIC Link is Up
10 Gbps, Flow Control: RX/TX
<...>-20533 [000] .... 81811.041257: ixgbe_check_vf_rate_limit
<-ixgbe_service_task
<...>-20533 [000] .... 81811.041272: ixgbe_ping_all_vfs <-ixgbe_service_task
kworker/u48:0-7503 [010] .... 81811.041345: ixgbe_get_stats64 <-dev_get_stats
kworker/u48:0-7503 [010] .... 81811.041393: bond_netdev_event: eth1: event: 1b
kworker/u48:0-7503 [010] .... 81811.041394: bond_netdev_event: eth1: IFF_SLAVE
kworker/u48:0-7503 [010] .... 81811.041395: bond_netdev_event: eth1:
slave->speed = ffffffff
<...>-20533 [000] .... 81811.041407: ixgbe_ptp_overflow_check
<-ixgbe_service_task
kworker/u48:0-7503 [010] .... 81811.041407: bond_mii_monitor: bond0: link
status definitely up for interface eth1, 0 Mbps full duplex
From looking at the code that prints this, the "full" duplex is
probably actually DUPLEX_UNKNOWN, but the netdev_info uses the
expression slave->duplex ? "full" : "half", so DUPLEX_UNKNOWN at 0xff
would print "full."
This is what ixgbe_get_settings returns for speed and duplex if
it is called when carrier is off.
I agree with you totally. I think it is the root cause.
Best Regards!
Zhu Yanjun
As a proof of concept I added NETDEV_CHANGELOWERSTATE in
bond_slave_netdev_event() along with NETDEV_UP/CHANGE:
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b5605..a9dac4c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -3014,6 +3014,7 @@ static int bond_slave_netdev_event(unsigned long event,
break;
case NETDEV_UP:
case NETDEV_CHANGE:
+ case NETDEV_CHANGELOWERSTATE:
bond_update_speed_duplex(slave);
if (BOND_MODE(bond) == BOND_MODE_8023AD)
bond_3ad_adapter_speed_duplex_changed(slave);
With this change I have not seen 0 Mbps reported by the bonding driver (around
12 hour test up to this point
vs. 2-3 hours otherwise). Although I suppose it could also be some sort of
race/timing issue with bond_mii_monitor().
This change as a fix seems kind of odd, since CHANGELOWERSTATE
is generated by bonding itself. Perhaps the net effect is to add a
delay and then update the speed and duplex, masking the actual problem.
Emil, if I recall correctly, the test patch I send that uses the
notifiers directly instead of miimon (specify miimon=0 and have bonding
respond to the notifiers) handled everything properly, right? If so I
can split that up and submit it properly; it seems more like a feature
than a straightforward bug fix, so I'm not sure it's appropriate for
net.
As a possibly less complex alternative for the miimon > 0 case,
could you try the following:
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 56b560558884..ac8921e65f26 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2120,6 +2120,7 @@ static void bond_miimon_commit(struct bonding *bond)
{
struct list_head *iter;
struct slave *slave, *primary;
+ int link_state;
bond_for_each_slave(bond, slave, iter) {
switch (slave->new_link) {
@@ -2127,6 +2128,10 @@ static void bond_miimon_commit(struct bonding *bond)
continue;
case BOND_LINK_UP:
+ link_state = bond_check_dev_link(bond, slave->dev, 0);
+ if (!link_state)
+ continue;
+ bond_update_speed_duplex(slave);
bond_set_slave_link_state(slave, BOND_LINK_UP,
BOND_SLAVE_NOTIFY_NOW);
slave->last_link_up = jiffies;
This will make bonding recheck the link state and update the
speed and duplex after it acquires RTNL to commit a link change. This
probably still has a race, since the change of carrier state in the
device is not mutexed by anything bonding can acquire (so it can always
change as soon as it's checked).
Thanks,
-J
---
-Jay Vosburgh, jay.vosbu...@canonical.com