On Wed, Jan 04, 2012 at 04:40:57PM +0100, Lars Ellenberg wrote:
> On Tue, Jan 03, 2012 at 08:16:15PM +0100, Florian Haas wrote:
> > Hi,
> >
> > DRBD 8.3.12 on CentOS 6.2; SDP from kernel-ib-1.5.3, built locally
>
> You too, of all people?
>
> Something crops up, and since DRBD is used at the same time,
> it has to be DRBD's fault?
>
> I mean, that's possible, of course. But ...
Hm, well, I love to correct my own arrogant statements ;-)
> > # drbdadm down all; lsmod | grep ib_sdp
> > ib_sdp 130827 4294967294
> >
> > 4 billion references on that module look excessive. :) I suppose the
> > refcount incorrectly goes negative.
>
> Sure. That's a -2.
>
> > This is inconvenient as you're now unable to unload ib_sdp. I presume
> > this is a bug;
>
> /me too ;-)
>
> Only at this point I doubt it is a DRBD bug.
>
> All module refcount stuff is implicit, so I would expect the module
> count on all other network related modules to go wrong as well.
Hm. We'll see about that.
> Besides, I think I complained about that to the OFED guys
> about two and a half years ago already,
> when I helped to fix their memleak and frame corruption.
>
> Never pressed the issue, though,
> and can not remember any useful response.
>
> Of course it _may_ be DRBD, or something that DRBD could work around,
> but I suspect it is something in the OFED stack.
> If they reason otherwise, I'll listen.
>
> > if I can provide any traces or debug logs to narrow
> > down the issue I'll be happy to.
>
> Let us know what you find out.
Based on some git blaming, I found
drbd: 53eb779 (July 2008)
kernel: ac5a488e (long ago), 1b08534e (Dec 2008)
The relevant part of the latter is:
commit 1b08534e562dae7b084326f8aa8cc12a4c1b6593
net: Fix module refcount leak in kernel_accept()
...
diff --git a/net/socket.c b/net/socket.c
index 92764d8..76ba80a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2307,6 +2307,7 @@ int kernel_accept(struct socket *sock, struct socket
**newsock, int flags)
}
(*newsock)->ops = sock->ops;
+ __module_get((*newsock)->ops->owner);
done:
return err;
So. We are doing it as the kernel was doing it back in July 2008,
only the kernel was doing it wrong, and got fixed in December :-/
You can verify if you see such imbalance when using ipv6 (as a module) as well.
And you can try a patch:
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 0e55c45..7decee3 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -528,6 +528,7 @@ STATIC int drbd_accept(struct drbd_conf *mdev, const char
**what,
goto out;
}
(*newsock)->ops = sock->ops;
+ __module_get((*newsock)->ops->owner);
out:
return err;
Thanks,
Lars
--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
drbd-user mailing list
[email protected]
http://lists.linbit.com/mailman/listinfo/drbd-user