Jay,

while scenario you describe makes sense, it does not match what we see in our 
tests.

The instrumentation prints info every time we enter bond_mii_monitor(), 
bond_miimon_inspect(),
bond_miimon_commit() and every time we are committing link state. And we print 
a message every time we
propose BOND_LINK_FAIL in bond_miimon_inspect()

So in your scenario,

2) bond_mii_monitor rtnl_trylock fails, it reschedules

we should see bond_miimon_inspect() 'proposed BOND_LINK_FAIL' debugging message 
without matching
'commit' messages. But what we see in reality is that for each 'proposed' there 
is 'commit' message.
(And we don't expect ens3f1 to really go down when VC module is rebooted - it 
is not connected to it).

Here is debugging output (with the fix I have suggested in my first email 
*applied*),
my comments inline.

  (FYI: in "bond_mii_monitor: ens3f0 current link state: 0" lines we print 
slave->link when entering bond_mii_monitor)

***

Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f0 current link 
state: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f1 current link 
state: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_UP case on slave ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_UP case on slave ens3f1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f0 current link 
state: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f1 current link 
state: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_UP case on slave ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_UP case on slave ens3f1
<snipped NetworkManager lines>
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f0 current link 
state: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f1 current link 
state: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_UP case on slave ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: proposed 
BOND_LINK_FAIL on slave ens3f0
             /*FALLTHRU*/
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_FAIL case on slave ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: set 
new_link=BOND_LINK_DOWN on slave ens3f0

Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_UP case on slave ens3f1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect returned non-zero

   As you see in lines above, we reach BOND_LINK_FAIL on ens3f0 only, ens3f1 
has good link_state and we
   do not reach BOND_LINK_FAIL fallthru and do not propose anything
   (otherwise there would be debugging output for it)

   Now we are in bond_mii_monitor() bond_for_each_slave() loop, committing link 
states - and suddenly
       we have state 0->1 for interface ens3f1 as well as for ens3f0

Oct 31 09:09:25 SYDC1LNX kernel: bond0: committing link state 0->1 for 
interface ens3f0, 0 Mbps full duplex
Oct 31 09:09:25 SYDC1LNX kernel: bond0: slave->should_notify_link for interface 
ens3f0 now: 1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: committing link state 0->1 for 
interface ens3f1, 20000 Mbps full duplex
Oct 31 09:09:25 SYDC1LNX kernel: bond0: slave->should_notify_link for interface 
ens3f1 now: 1

   Entering bond_miimon_commit()
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave 
ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_DOWN
Oct 31 09:09:25 SYDC1LNX kernel: bond0: link status definitely down for 
interface ens3f0, disabling it
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_find_best_slave: slave->link: 2, up: 
false, slave->delay: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_find_best_slave: slave->link: 1, up: 
true, slave->delay: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_find_best_slave: no bestslave 
found, bond failure imminent
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_select_active_slave: best_slave != 
curr_active_slave
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_select_active_slave: best_slave is 
NULL, this is probably bad
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_change_active_slave: old_active: 
ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_change_active_slave: new_active: 
NULL
Oct 31 09:09:25 SYDC1LNX kernel: bond0: Removing MAC from old_active
Oct 31 09:09:25 SYDC1LNX kernel: bond0: ALB and TLB modes should call 
bond_alb_handle_active_change
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_set_carrier: turning carrier off
Oct 31 09:09:25 SYDC1LNX kernel: bond0: now running without any active 
interface!

        Even though link state of ens3f1 has changed, we have BOND_LINK_NOCHANGE
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave 
ens3f1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_NOCHANGE 
on slave ens3f1

    Another invocation of bond_mii_monitor
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f0 current link 
state: 2
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f1 current link 
state: 1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_DOWN case on slave ens3f0

Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_FAIL case on slave ens3f1

   We see "up again on ens3f1" and execute slave->new_link = BOND_LINK_UP; (my 
initial suggestion)
Oct 31 09:09:25 SYDC1LNX kernel: bond0: link status up again after 0 ms for 
interface ens3f1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_inspect returned non-zero
Oct 31 09:09:25 SYDC1LNX kernel: bond0: committing link state 2->2 for 
interface ens3f0, 0 Mbps full duplex
Oct 31 09:09:25 SYDC1LNX kernel: bond0: slave->should_notify_link for interface 
ens3f0 now: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: committing link state 1->0 for 
interface ens3f1, 20000 Mbps full duplex
Oct 31 09:09:25 SYDC1LNX kernel: bond0: slave->should_notify_link for interface 
ens3f1 now: 0

   As we nudged slave->new_link, bond_miimon_commit() now works
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave 
ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_NOCHANGE 
on slave ens3f0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: working on slave 
ens3f1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_miimon_commit: BOND_LINK_UP
Oct 31 09:09:25 SYDC1LNX kernel: bond0: link status definitely up for interface 
ens3f1, 20000 Mbps full duplex
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_find_best_slave: slave->link: 2, up: 
false, slave->delay: 0
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_find_best_slave: slave ens3f1 
BOND_LINK_UP
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_select_active_slave: best_slave != 
curr_active_slave
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_change_active_slave: old_active: 
NULL
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_change_active_slave: new_active: 
ens3f1
Oct 31 09:09:25 SYDC1LNX kernel: bond0: Non-BOND_LINK_BACK case
Oct 31 09:09:25 SYDC1LNX kernel: bond0: making interface ens3f1 the new active 
one
Oct 31 09:09:25 SYDC1LNX kernel: bond0: Setting MAC on new_active
Oct 31 09:09:25 SYDC1LNX kernel: bond0: ALB and TLB modes should call 
bond_alb_handle_active_change
Oct 31 09:09:25 SYDC1LNX kernel: bond0: bond_set_carrier: turning carrier on
Oct 31 09:09:25 SYDC1LNX kernel: bond0: first active interface up!
<snipped NetworkManager lines>
Oct 31 09:09:26 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f0 current link 
state: 2
Oct 31 09:09:26 SYDC1LNX kernel: bond0: bond_mii_monitor: ens3f1 current link 
state: 0
Oct 31 09:09:26 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_DOWN case on slave ens3f0
Oct 31 09:09:26 SYDC1LNX kernel: bond0: bond_miimon_inspect: entered 
BOND_LINK_UP case on slave ens3f1
...

Alex

On 11/02/2017 09:11 PM, Jay Vosburgh wrote:
Alex Sidorenko <alexandre.sidore...@hpe.com> wrote:
On 11/02/2017 12:51 AM, Jay Vosburgh wrote:
Jarod Wilson <ja...@redhat.com> wrote:

On 2017-11-01 8:35 PM, Jay Vosburgh wrote:
Jay Vosburgh <jay.vosbu...@canonical.com> wrote:

Alex Sidorenko <alexandre.sidore...@hpe.com> wrote:

The problem has been found while trying to deploy RHEL7 on HPE Synergy
platform, it is seen both in customer's environment and in HPE test lab.

There are several bonds configured in TLB mode and miimon=100, all other
options are default. Slaves are connected to VirtualConnect
modules. Rebooting a VC module should bring one bond slave (ens3f0) down
temporarily, but not another one (ens3f1). But what we see is

Oct 24 10:37:12 SYDC1LNX kernel: bond0: link status up again after 0 ms for 
interface ens3f1

        In net-next, I don't see a path in the code that will lead to
this message, as it would apparently require entering
bond_miimon_inspect in state BOND_LINK_FAIL but with downdelay set to 0.
If downdelay is 0, the code will transition to BOND_LINK_DOWN and not
remain in _FAIL state.

The kernel in question is laden with a fair bit of additional debug spew,
as we were going back and forth, trying to isolate where things were going
wrong.  That was indeed from the BOND_LINK_FAIL state in
bond_miimon_inspect, inside the if (link_state) clause though, so after
commit++, there's a continue, which ... does what now? Doesn't it take us
back to the top of the bond_for_each_slave_rcu() loop, so we bypass the
next few lines of code that would have led to a transition to
BOND_LINK_DOWN?

        Just to confirm: your downdelay is 0, correct?

Correct.


        And, do you get any other log messages other than "link status
up again after 0 ms"?

Yes, here are some messages (from an early instrumentation):
[...]
That is, we never see ens3f1 going to BOND_LINK_DOWN and it continues
staying in BOND_LINK_NOCHANGE/BOND_LINK_FAIL



        To answer your question, yes, the "if (link_state) {" block in
the BOND_LINK_FAIL case of bond_miimon_inspect ends in continue, but
this path is nominally for the downdelay logic.  If downdelay is active
and the link recovers before the delay expires, the link should never
have moved to BOND_LINK_DOWN.  The commit++ causes bond_miimon_inspect
to return nonzero, causing in turn the bond_propose_link_state change to
BOND_LINK_FAIL state to be committed.  This path deliberately does not
set slave->new_link, as downdelay is purposely delaying the transition
to BOND_LINK_DOWN.

        If downdelay is 0, the slave->link should not persist in
BOND_LINK_FAIL state; it should set new_link = BOND_LINK_DOWN which will
cause a transition in bond_miimon_commit.  The bond_propose_link_state
call to set BOND_LINK_FAIL in the BOND_LINK_UP case will be committed in
bond_mii_monitor prior to calling bond_miimon_commit, which will in turn
do the transition to _DOWN state.  In this case, the BOND_LINK_FAIL case
"if (link_state) {" block should never be entered.

I totally agree with your description of transition logic, and this is why
we were puzzled by how this can occur until we noticed NetworkManager
messages around this time and decided to run a test without it.
Without NM, everything works as expected. After that, adding more
instrumentation, we have found that we do not propose BOND_LINK_FAIL inside
bond_miimon_inspect() but elsewhere (NetworkManager?).

        I think I see the flaw in the logic.

        1) bond_miimon_inspect finds link_state = 0, then makes a call
to bond_propose_link_state(BOND_LINK_FAIL), setting link_new_state to
BOND_LINK_FAIL.  _inspect then sets slave->new_link = BOND_LINK_DOWN and
returns non-zero.

        2) bond_mii_monitor rtnl_trylock fails, it reschedules.

        3) bond_mii_monitor runs again, and calls bond_miimon_inspect.

        4) the slave's link has recovered, so link_state != 0.
slave->link is still BOND_LINK_UP.  The slave's link_new_state remains
set to BOND_LINK_FAIL, but new_link is reset to NOCHANGE.
bond_miimon_inspect returns 0, so nothing is committed.

        5) step 4 can repeat indefinitely.

        6) eventually, the other slave does something that causes
commit++, making bond_mii_monitor call bond_commit_link_state and then
bond_miimon_commit.  The slave in question from steps 1-4 still has
link_new_state as BOND_LINK_FAIL, but new_link is NOCHANGE, so it ends
up in BOND_LINK_FAIL state.

        I think step 6 could also occur concurrently with the initial
pass through step 4 to induce the problem.

        It looks like Mahesh mostly fixed this in

commit fb9eb899a6dc663e4a2deed9af2ac28f507d0ffb
Author: Mahesh Bandewar <mahe...@google.com>
Date:   Tue Apr 11 22:36:00 2017 -0700

     bonding: handle link transition from FAIL to UP correctly

        but the window still exists, and requires the slave link state
to change between the failed rtnl_trylock and the second pass through
_inspect.  The problem is that a state transition has been kept between
invocations to _inspect, but the condition that induced the transition
has changed.

        I haven't tested these, but I suspect the solution is either to
clear link_new_state on entry to the loop in bond_miimon_inspect, or
merge new_state and link_new_state as a single "next state" (which is
cleared on entry to the loop).

        The first of these is a pretty simple patch:

diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c
index 18b58e1376f1..6f89f9981a6c 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2046,6 +2046,7 @@ static int bond_miimon_inspect(struct bonding *bond)
bond_for_each_slave_rcu(bond, slave, iter) {
                slave->new_link = BOND_LINK_NOCHANGE;
+               slave->link_new_state = slave->link;
link_state = bond_check_dev_link(bond, slave->dev, 0);
        Alex / Jarod, could you check my logic, and would you be able to
test this patch if my analysis appears sound?

        Thanks,

        -J

---
        -Jay Vosburgh, jay.vosbu...@canonical.com


--
------------------------------------------------------------------
Alex Sidorenko  email: a...@hpe.com
ERT  Linux      Hewlett-Packard Enterprise (Canada)
------------------------------------------------------------------

Reply via email to