Gleb Smirnoff wrote: > On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote: > J> Maxime Henrion wrote: > J> > Replying to myself on this one, sorry about that. > J> > I said in my previous mail that I didn't know yet what process was > J> > holding the lock of the rtentry that the routed process is dealing > J> > with in rt_setgate(), and I just could verify that it is held by > J> > the swi1: net thread. > J> > So, in a nutshell: > J> > - The routed process does its business on the routing socket, that ends > up > J> > calling rt_setgate(). While in rt_setgate() it drops the lock on its > J> > rtentry in order to call rtalloc1(). At this point, the routed > J> > process hold the gateway route (rtalloc1() returns it locked), and it > J> > now tries to re-lock the original rtentry. > J> > - At the same time, the swi net thread calls arpresolve() which ends up > J> > calling rt_check(). Then rt_check() locks the rtentry, and tries to > J> > lock the gateway route. > J> > A classical case of deadlock with mutexes because of different locking > J> > order. Now, it's not obvious to me how to fix it :-). > J> > J> On failure to re-lock, the routed call to rt_setgate should completely > abort > J> and restart from scratch, releasing all locks it has on the way out. > > Do you suggest mtx_trylock?
I actually have the beginning of a patch that uses mtx_trylock(), wrapped into a RT_TRYLOCK() macro. It certainly isn't very pretty, but if that can help me having a workaround, that'd still be useful. It really seems like the real fix would invovle a fair amount of rewrite and analysis of the current code, so... I have yet to find time to finish it, build-test it, and run-test it. Did you got any farther in the rt_check() cleanup you've been telling me about? Cheers, Maxime _______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"