On Wed, 19 Dec 2007, Andy Gospodarek wrote:
On Tue, Dec 18, 2007 at 08:53:39PM +0100, Krzysztof Oledzki wrote:
On Fri, 14 Dec 2007, Andy Gospodarek wrote:
On Fri, Dec 14, 2007 at 07:57:42PM +0100, Krzysztof Oledzki wrote:
On Fri, 14 Dec 2007, Andy Gospodarek wrote:
On Fri, Dec 14, 2007 at 05:14:57PM +0100, Krzysztof Oledzki wrote:
On Wed, 12 Dec 2007, Jay Vosburgh wrote:
Herbert Xu <[EMAIL PROTECTED]> wrote:
diff -puN drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
drivers/net/bonding/bond_sysfs.c
--- a/drivers/net/bonding/bond_sysfs.c~bonding-locking-fix
+++ a/drivers/net/bonding/bond_sysfs.c
@@ -1111,8 +1111,6 @@ static ssize_t bonding_store_primary(str
out:
write_unlock_bh(&bond->lock);
- rtnl_unlock();
-
Looking at the changeset that added this perhaps the intention
is to hold the lock? If so we should add an rtnl_lock to the start
of the function.
Yes, this function needs to hold locks, and more than just
what's there now. I believe the following should be correct; I haven't
tested it, though (I'm supposedly on vacation right now).
The following change should be correct for the
bonding_store_primary case discussed in this thread, and also corrects
the bonding_store_active case which performs similar functions.
The bond_change_active_slave and bond_select_active_slave
functions both require rtnl, bond->lock for read and curr_slave_lock
for
write_bh, and no other locks. This is so that the lower level
mode-specific functions can release locks down to just rtnl in order to
call, e.g., dev_set_mac_address with the locks it expects (rtnl only).
Signed-off-by: Jay Vosburgh <[EMAIL PROTECTED]>
diff --git a/drivers/net/bonding/bond_sysfs.c
b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..28a2d80 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct
device
*d,
struct slave *slave;
struct bonding *bond = to_bond(d);
- write_lock_bh(&bond->lock);
+ rtnl_lock();
+ read_lock(&bond->lock);
+ write_lock_bh(&bond->curr_slave_lock);
+
if (!USES_PRIMARY(bond->params.mode)) {
printk(KERN_INFO DRV_NAME
": %s: Unable to set primary slave; %s is in mode
%d\n",
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct
device
*d,
}
}
out:
- write_unlock_bh(&bond->lock);
-
+ write_unlock_bh(&bond->curr_slave_lock);
+ read_unlock(&bond->lock);
rtnl_unlock();
return count;
@@ -1190,7 +1193,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
struct bonding *bond = to_bond(d);
rtnl_lock();
- write_lock_bh(&bond->lock);
+ read_lock(&bond->lock);
+ write_lock_bh(&bond->curr_slave_lock);
if (!USES_PRIMARY(bond->params.mode)) {
printk(KERN_INFO DRV_NAME
@@ -1247,7 +1251,8 @@ static ssize_t bonding_store_active_slave(struct
device *d,
}
}
out:
- write_unlock_bh(&bond->lock);
+ write_unlock_bh(&bond->curr_slave_lock);
+ read_unlock(&bond->lock);
rtnl_unlock();
return count;
Vanilla 2.6.24-rc5 plus this patch:
=========================================================
[ INFO: possible irq lock inversion dependency detected ]
2.6.24-rc5 #1
---------------------------------------------------------
events/0/9 just changed the state of lock:
(&mc->mca_lock){-+..}, at: [<c0411c7a>] mld_ifc_timer_expire+0x130/0x1fb
but this lock took another, soft-read-irq-unsafe lock in the past:
(&bond->lock){-.--}
and interrupts could create inverse lock ordering between them.
Grrr, I should have seen that -- sorry. Try your luck with this instead:
<CUT>
No luck.
I'm guessing if we go back to using a write-lock for bond->lock this
will go back to working again, but I'm not totally convinced since there
are plenty of places where we used a read-lock with it.
Should I check this patch or rather, based on a future discussion, wait
for another version?
diff --git a/drivers/net/bonding/bond_sysfs.c
b/drivers/net/bonding/bond_sysfs.c
index 11b76b3..635b857 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -1075,7 +1075,10 @@ static ssize_t bonding_store_primary(struct device
*d,
struct slave *slave;
struct bonding *bond = to_bond(d);
+ rtnl_lock();
write_lock_bh(&bond->lock);
+ write_lock_bh(&bond->curr_slave_lock);
+
if (!USES_PRIMARY(bond->params.mode)) {
printk(KERN_INFO DRV_NAME
": %s: Unable to set primary slave; %s is in mode
%d\n",
@@ -1109,8 +1112,8 @@ static ssize_t bonding_store_primary(struct device
*d,
}
}
out:
+ write_unlock_bh(&bond->curr_slave_lock);
write_unlock_bh(&bond->lock);
-
rtnl_unlock();
return count;
@@ -1191,6 +1194,7 @@ static ssize_t bonding_store_active_slave(struct
device *d,
rtnl_lock();
write_lock_bh(&bond->lock);
+ write_lock_bh(&bond->curr_slave_lock);
if (!USES_PRIMARY(bond->params.mode)) {
printk(KERN_INFO DRV_NAME
@@ -1247,6 +1251,7 @@ static ssize_t bonding_store_active_slave(struct
device *d,
}
}
out:
+ write_unlock_bh(&bond->curr_slave_lock);
write_unlock_bh(&bond->lock);
rtnl_unlock();
Best regards,
Krzysztof Olędzki
For now, I prefer Jay's original patch -- with the read_locks (rather
than read/write_lock_bh) and the added rtnl_lock. There is still a
lockdep issue that we need to sort-out, but this patch is needed first.
This bug has not been fixed yet as it still exists in 2.6.24-rc7. Any
chances to cure it before 2.6.24-final?
Best regards,
Krzysztof Olędzki