Gleb Smirnoff wrote:
> On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote:
> J>  Maxime Henrion wrote:
> J> > Replying to myself on this one, sorry about that.
> J> > I said in my previous mail that I didn't know yet what process was
> J> > holding the lock of the rtentry that the routed process is dealing
> J> > with in rt_setgate(), and I just could verify that it is held by
> J> > the swi1: net thread.
> J> > So, in a nutshell:
> J> > - The routed process does its business on the routing socket, that ends 
> up
> J> >   calling rt_setgate().  While in rt_setgate() it drops the lock on its
> J> >   rtentry in order to call rtalloc1().  At this point, the routed
> J> >   process hold the gateway route (rtalloc1() returns it locked), and it
> J> >   now tries to re-lock the original rtentry.
> J> > - At the same time, the swi net thread calls arpresolve() which ends up
> J> >   calling rt_check().  Then rt_check() locks the rtentry, and tries to
> J> >   lock the gateway route.
> J> > A classical case of deadlock with mutexes because of different locking
> J> > order.  Now, it's not obvious to me how to fix it :-).
> J> 
> J>  On failure to re-lock, the routed call to rt_setgate should completely 
> abort 
> J>  and restart from scratch, releasing all locks it has on the way out.
> 
> Do you suggest mtx_trylock?

I actually have the beginning of a patch that uses mtx_trylock(),
wrapped into a RT_TRYLOCK() macro.  It certainly isn't very pretty,
but if that can help me having a workaround, that'd still be useful.
It really seems like the real fix would invovle a fair amount of
rewrite and analysis of the current code, so...

I have yet to find time to finish it, build-test it, and run-test it.

Did you got any farther in the rt_check() cleanup you've been telling
me about?

Cheers,
Maxime

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to