Julian Elischer wrote: > Gleb Smirnoff wrote: > >On Thu, Dec 13, 2007 at 10:33:25AM -0800, Julian Elischer wrote: > >J> Maxime Henrion wrote: > >J> > Replying to myself on this one, sorry about that. > >J> > I said in my previous mail that I didn't know yet what process was > >J> > holding the lock of the rtentry that the routed process is dealing > >J> > with in rt_setgate(), and I just could verify that it is held by > >J> > the swi1: net thread. > >J> > So, in a nutshell: > >J> > - The routed process does its business on the routing socket, that > >ends up > >J> > calling rt_setgate(). While in rt_setgate() it drops the lock on > >its > >J> > rtentry in order to call rtalloc1(). At this point, the routed > >J> > process hold the gateway route (rtalloc1() returns it locked), and > >it > >J> > now tries to re-lock the original rtentry. > >J> > - At the same time, the swi net thread calls arpresolve() which ends > >up > >J> > calling rt_check(). Then rt_check() locks the rtentry, and tries to > >J> > lock the gateway route. > >J> > A classical case of deadlock with mutexes because of different locking > >J> > order. Now, it's not obvious to me how to fix it :-). > >J> > >J> On failure to re-lock, the routed call to rt_setgate should completely > >abort J> and restart from scratch, releasing all locks it has on the way > >out. > > > >Do you suggest mtx_trylock? > > I think that would be the cleanest way..
So, here's what I've got. I have yet to test it at all, I hope that I'll be able to do so today, or tomorrow. Any input appreciated. Cheers, Maxime
diff -Nru /sys/net/route.c net/route.c --- /sys/net/route.c Tue Oct 30 19:07:54 2007 +++ net/route.c Mon Dec 17 11:05:56 2007 @@ -996,6 +996,7 @@ struct radix_node_head *rnh = rt_tables[dst->sa_family]; int dlen = SA_SIZE(dst), glen = SA_SIZE(gate); +again: RT_LOCK_ASSERT(rt); /* @@ -1029,7 +1030,16 @@ RT_REMREF(rt); return (EADDRINUSE); /* failure */ } - RT_LOCK(rt); + /* + * Try to reacquire the lock on rt, and if it fails, + * clean state and restart from scratch. + */ + ok = RT_TRYLOCK(rt); + if (!ok) { + RTFREE_LOCKED(gwrt); + RT_LOCK(rt); + goto again; + } /* * If there is already a gwroute, then drop it. If we * are asked to replace route with itself, then do diff -Nru /sys/net/route.h net/route.h --- /sys/net/route.h Tue Apr 4 22:07:23 2006 +++ net/route.h Fri Dec 14 11:47:48 2007 @@ -289,6 +289,7 @@ #define RT_LOCK_INIT(_rt) \ mtx_init(&(_rt)->rt_mtx, "rtentry", NULL, MTX_DEF | MTX_DUPOK) #define RT_LOCK(_rt) mtx_lock(&(_rt)->rt_mtx) +#define RT_TRYLOCK(_rt) mtx_trylock(&(_rt)->rt_mtx) #define RT_UNLOCK(_rt) mtx_unlock(&(_rt)->rt_mtx) #define RT_LOCK_DESTROY(_rt) mtx_destroy(&(_rt)->rt_mtx) #define RT_LOCK_ASSERT(_rt) mtx_assert(&(_rt)->rt_mtx, MA_OWNED)
_______________________________________________ freebsd-net@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "[EMAIL PROTECTED]"