On Monday 23 June 2008 03:16:40 pm James Gritton wrote: > John Baldwin wrote: > > On Thursday 19 June 2008 11:57:51 am James Gritton wrote: > > > >> John Baldwin wrote: > >> > >>> On Sunday 15 June 2008 07:23:19 am Stef Walter wrote: > >>> > >>> > >>>> I've been trying to track down a deadlock on some newish production > >>>> servers running FreeBSD 6.3-RELEASE-p2. The deadlock occurs on a > >>>> specific (although mundane) hardware configuration, and each of several > >>>> servers running this hardware deadlock about once per week. > >>>> > >>>> Although I suspect that this is not hardware related, from a (naive) > >>>> perusal of the attached stack traces. > >>>> > >>>> Forgive me if my interpretation of this is all wrong, but I'm pretty > >>>> desperate for help. So here's my basic understanding of the deadlock: > >>>> > >>>> These processes seem to be waiting on the page queue mutex: > >>>> sendmail (in vm_mmap > vm_map_find > vm_map_insert > vm_map_pmap_enter) > >>>> bsnmpd (in malloc, uma_large_malloc > page_alloc > kmem_malloc) > >>>> httpd (in trap > trap_pfault > vm_fault) > >>>> [g_up] (in g_vfs_done > bufdone) > >>>> > >>>> The page queue mutex is held by rsync process: > >>>> rsync (in trap > trap_pfault > vm_fault > pmap_enter) > >>>> > >>>> Rsync kernel process (in pmap_enter) was interrupted while holding the > >>>> page queue lock? > >>>> > >>>> > >>>> Giant is enabled in loader.conf due to the needs of the pf firewall when > >>>> dealing with user credentials lookups. I do not believe that Giant plays > >>>> into this deadlock. Kernel config attached. > >>>> > >>>> Any and all help or info is welcome. Thanks in advance. > >>>> > >>>> > >>> Try this change: > >>> > >>> jhb 2007-10-27 22:07:40 UTC > >>> > >>> FreeBSD src repository > >>> > >>> Modified files: > >>> sys/kern sched_4bsd.c > >>> Log: > >>> Change the roundrobin implementation in the 4BSD scheduler to trigger a > >>> userland preemption directly from hardclock() via sched_clock() when a > >>> thread uses up a full quantum instead of using a periodic timeout to > >>> > > cause > > > >>> a userland preemption every so often. This fixes a potential deadlock > >>> when IPI_PREEMPTION isn't enabled where softclock blocks on a lock held > >>> by a thread pinned or bound to another CPU. The current thread on that > >>> CPU will never be preempted while softclock is blocked. > >>> > >>> Note that ULE already drives its round-robin userland preemption from > >>> sched_clock() as well and always enables IPI_PREEMPT. > >>> > >>> MFC after: 1 week > >>> > >>> Revision Changes Path > >>> 1.108 +8 -29 src/sys/kern/sched_4bsd.c > >>> > >>> We use it at work on 6.x. W/o this fix, round-robin stops working on 4BSD > >>> when softclock() (swi4: clock) blocks on a lock like Giant. > >>> > >>> > >> I've been seeing similar troubles on 6.2 and I'll have to give this a > >> try as we upgrade to 6.3. I notice "MFC after: 1 week" in the log; it's > >> been a week - any chance of seeing this fix rolled into 6.x? > >> > > > > If people confirm it fixes issues I will MFC it. There was some pushback when > > I first committed it so I waited on the MFC. > > I can confirm that on 6.3 I can recreate the deadlock without the patch, > and can't recreate it with the patch.
Ok, I've merged it to RELENG_[67]. -- John Baldwin _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"