On Mon, Jan 07, 2008 at 06:57:25PM +0100, Krzysztof Oledzki wrote:
> 
> 
> On Wed, 19 Dec 2007, Andy Gospodarek wrote:
> 
> >On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
> >>
> >>
> >>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
> >>
> >>>On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
> >>>>
> >>>>
> >>>>On Fri, 14 Dec 2007, Andy Gospodarek wrote:
> >>>>
> >>>>>On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
> >>>>>>
> >>>>>>
> >>>>>>On Wed, 12 Dec 2007, Jay Vosburgh wrote:
> >>>>>>
> >>>>>>>Herbert Xu <[EMAIL PROTECTED]> wrote:
> >>>>>>>
> >>>>>>>>>diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
> >>>>>>>>>drivers/net/bonding/bond_sysfs.c
> >>>>>>>>>--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
> >>>>>>>>>+++ a/drivers/net/bonding/bond_sysfs.c
> >>>>>>>>>@@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
> >>>>>>>>>out:
> >>>>>>>>>    write_unlock_bh(&bond->lock);
> >>>>>>>>>
> >>>>>>>>>-       rtnl_unlock();
> >>>>>>>>>-
> >>>>>>>>
> >>>>>>>>Looking at the changeset that added this perhaps the intention
> >>>>>>>>is to hold the lock? If so we should add an rtnl_lock to the start
> >>>>>>>>of the function.
> >>>>>>>
> >>>>>>>       Yes, this function needs to hold locks, and more than just
> >>>>>>>what's there now.  I believe the following should be correct; I 
> >>>>>>>haven't
> >>>>>>>tested it, though (I'm supposedly on vacation right now).
> >>>>>>>
> >>>>>>>       The following change should be correct for the
> >>>>>>>bonding_store_primary case discussed in this thread, and also 
> >>>>>>>corrects
> >>>>>>>the bonding_store_active case which performs similar functions.
> >>>>>>>
> >>>>>>>       The bond_change_active_slave and bond_select_active_slave
> >>>>>>>functions both require rtnl, bond->lock for read and curr_slave_lock
> >>>>>>>for
> >>>>>>>write_bh, and no other locks.  This is so that the lower level
> >>>>>>>mode-specific functions can release locks down to just rtnl in order 
> >>>>>>>to
> >>>>>>>call, e.g., dev_set_mac_address with the locks it expects (rtnl 
> >>>>>>>only).
> >>>>>>>
> >>>>>>>Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>
> >>>>>>>
> >>>>>>>diff --git a/drivers/net/bonding/bond_sysfs.c
> >>>>>>>b/drivers/net/bonding/bond_sysfs.c
> >>>>>>>index 11b76b3..28a2d80 100644
> >>>>>>>--- a/drivers/net/bonding/bond_sysfs.c
> >>>>>>>+++ b/drivers/net/bonding/bond_sysfs.c
> >>>>>>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
> >>>>>>>device
> >>>>>>>*d,
> >>>>>>>       struct slave *slave;
> >>>>>>>       struct bonding *bond = to_bond(d);
> >>>>>>>
> >>>>>>>-      write_lock_bh(&bond->lock);
> >>>>>>>+      rtnl_lock();
> >>>>>>>+      read_lock(&bond->lock);
> >>>>>>>+      write_lock_bh(&bond->curr_slave_lock);
> >>>>>>>+
> >>>>>>>       if (!USES_PRIMARY(bond->params.mode)) {
> >>>>>>>               printk(KERN_INFO DRV_NAME
> >>>>>>>                      ": %s: Unable to set primary slave; %s is in 
> >>>>>>>                      mode
> >>>>>>>                      %d\n",
> >>>>>>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
> >>>>>>>device
> >>>>>>>*d,
> >>>>>>>               }
> >>>>>>>       }
> >>>>>>>out:
> >>>>>>>-      write_unlock_bh(&bond->lock);
> >>>>>>>-
> >>>>>>>+      write_unlock_bh(&bond->curr_slave_lock);
> >>>>>>>+      read_unlock(&bond->lock);
> >>>>>>>       rtnl_unlock();
> >>>>>>>
> >>>>>>>       return count;
> >>>>>>>@@ -1190,7 +1193,8 @@ static ssize_t 
> >>>>>>>bonding_store_active_slave(struct
> >>>>>>>device *d,
> >>>>>>>       struct bonding *bond = to_bond(d);
> >>>>>>>
> >>>>>>>       rtnl_lock();
> >>>>>>>-      write_lock_bh(&bond->lock);
> >>>>>>>+      read_lock(&bond->lock);
> >>>>>>>+      write_lock_bh(&bond->curr_slave_lock);
> >>>>>>>
> >>>>>>>       if (!USES_PRIMARY(bond->params.mode)) {
> >>>>>>>               printk(KERN_INFO DRV_NAME
> >>>>>>>@@ -1247,7 +1251,8 @@ static ssize_t 
> >>>>>>>bonding_store_active_slave(struct
> >>>>>>>device *d,
> >>>>>>>               }
> >>>>>>>       }
> >>>>>>>out:
> >>>>>>>-      write_unlock_bh(&bond->lock);
> >>>>>>>+      write_unlock_bh(&bond->curr_slave_lock);
> >>>>>>>+      read_unlock(&bond->lock);
> >>>>>>>       rtnl_unlock();
> >>>>>>>
> >>>>>>>       return count;
> >>>>>>
> >>>>>>Vanilla 2.6.24-rc5 plus this patch:
> >>>>>>
> >>>>>>=========================================================
> >>>>>>[ INFO: possible irq lock inversion dependency detected ]
> >>>>>>2.6.24-rc5 #1
> >>>>>>---------------------------------------------------------
> >>>>>>events/0/9 just changed the state of lock:
> >>>>>>(&mc->mca_lock){-+..}, at: [<c0411c7a>] 
> >>>>>>mld_ifc_timer_expire+0x130/0x1fb
> >>>>>>but this lock took another, soft-read-irq-unsafe lock in the past:
> >>>>>>(&bond->lock){-.--}
> >>>>>>
> >>>>>>and interrupts could create inverse lock ordering between them.
> >>>>>>
> >>>>>>
> >>>>>
> >>>>>Grrr, I should have seen that -- sorry.  Try your luck with this 
> >>>>>instead:
> >>>><CUT>
> >>>>
> >>>>No luck.
> >>>>
> >>>
> >>>
> >>>I'm guessing if we go back to using a write-lock for bond->lock this
> >>>will go back to working again, but I'm not totally convinced since there
> >>>are plenty of places where we used a read-lock with it.
> >>
> >>Should I check this patch or rather, based on a future discussion, wait
> >>for another version?
> >>
> >>>
> >>>diff --git a/drivers/net/bonding/bond_sysfs.c
> >>>b/drivers/net/bonding/bond_sysfs.c
> >>>index 11b76b3..635b857 100644
> >>>--- a/drivers/net/bonding/bond_sysfs.c
> >>>+++ b/drivers/net/bonding/bond_sysfs.c
> >>>@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
> >>>*d,
> >>>   struct slave *slave;
> >>>   struct bonding *bond = to_bond(d);
> >>>
> >>>+  rtnl_lock();
> >>>   write_lock_bh(&bond->lock);
> >>>+  write_lock_bh(&bond->curr_slave_lock);
> >>>+
> >>>   if (!USES_PRIMARY(bond->params.mode)) {
> >>>           printk(KERN_INFO DRV_NAME
> >>>                  ": %s: Unable to set primary slave; %s is in mode
> >>>                  %d\n",
> >>>@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
> >>>*d,
> >>>           }
> >>>   }
> >>>out:
> >>>+  write_unlock_bh(&bond->curr_slave_lock);
> >>>   write_unlock_bh(&bond->lock);
> >>>-
> >>>   rtnl_unlock();
> >>>
> >>>   return count;
> >>>@@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
> >>>device *d,
> >>>
> >>>   rtnl_lock();
> >>>   write_lock_bh(&bond->lock);
> >>>+  write_lock_bh(&bond->curr_slave_lock);
> >>>
> >>>   if (!USES_PRIMARY(bond->params.mode)) {
> >>>           printk(KERN_INFO DRV_NAME
> >>>@@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
> >>>device *d,
> >>>           }
> >>>   }
> >>>out:
> >>>+  write_unlock_bh(&bond->curr_slave_lock);
> >>>   write_unlock_bh(&bond->lock);
> >>>   rtnl_unlock();
> >>>
> >>
> >>
> >>Best regards,
> >>
> >>                                    Krzysztof Olędzki
> >
> >For now, I prefer Jay's original patch -- with the read_locks (rather
> >than read/write_lock_bh) and the added rtnl_lock.  There is still a
> >lockdep issue that we need to sort-out, but this patch is needed first.
> 
> This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any 
> chances to cure it before 2.6.24-final?
> 
> Best regards,
> 
>                               Krzysztof Olędzki

Krzysztof,

I doubt the lockdep issue will be fixed, but the patch Jay posted and I
acked needs to be included in 2.6.24.

I played around with the locking when setting the multicast list and I
can make the lockdep issue go away, but I need to be sure that it's OK
to switch it to a read-lock from a write-lock (and I don't really think
it is).

-andy

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to