Alexey, Roland, In debugging kernel lockup that occurs with IP over InfiniBand in 2.6.21-rc4: ( https://bugs.openfabrics.org/show_bug.cgi?id=402 )
I noticed the following code in dst_ifdown: /* Dirty hack. We did it in 2.2 (in __dst_free), * we have _very_ good reasons not to repeat * this mistake in 2.3, but we have no choice * now. _It_ _is_ _explicit_ _deliberate_ * _race_ _condition_. * * Commented and originally written by Alexey. */ static inline void dst_ifdown(struct dst_entry *dst, struct net_device *dev, int unregister) { if (dst->ops->ifdown) dst->ops->ifdown(dst, dev, unregister); if (dev != dst->dev) return; if (!unregister) { dst->input = dst_discard_in; dst->output = dst_discard_out; } else { dst->dev = &loopback_dev; dev_hold(&loopback_dev); dev_put(dev); if (dst->neighbour && dst->neighbour->dev == dev) { dst->neighbour->dev = &loopback_dev; dev_put(dev); dev_hold(&loopback_dev); } } } The line dst->neighbour->dev = &loopback_dev breaks IP over InfiniBand, simply because neighbour->parms still points to an entry that has been set up with dev->neigh_setup call from IPoIB neighbour device. So when neighbour->parms->neigh_destructor is called, we get to ipoib_neigh_destructor in drivers/infiniband/ulp/ipoib/ipoib_main.c, and that in turn crashes since it needs an infiniband device in neighbour dev pointer. This is not new code, and should have triggered long time ago, so I am not sure how come we are triggering this only now, but somehow this did not lead to crashes in 2.6.20, but does now in 2.6.21-rc4. Ideas on how to fix this? Why is neighbour->dev changed here? Can dst->neighbour be changed to point to NULL instead, and the neighbour released? Thanks very much, -- MST - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/