On Wed, Mar 07, 2007 at 06:07:31PM -0500, Ed Maste wrote: > Nightly tests on our 6.1-based installation using pgsql have resulted in > a number of kernel hangs, due to a corrupt semu_list (the list ended up > with a loop). > > It seems there are a few holes in the locking in the semaphore code. The > issue we've encountered comes from semexit_myhook. It obtains a pointer > to a list element after acquiring SEMUNDO_LOCK, and after dropping the > lock manipulates the next pointer to remove the element from the list. > > The fix below solves our current problem. Any comments? > > --- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005 > +++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007 > @@ -1259,16 +1259,17 @@ > struct proc *p; > { > struct sem_undo *suptr; > - struct sem_undo **supptr; > > /* > * Go through the chain of undo vectors looking for one > * associated with this process. > */ > SEMUNDO_LOCK(); > - SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) { > - if (suptr->un_proc == p) > + SLIST_FOREACH(suptr, &semu_list, un_next) { > + if (suptr->un_proc == p) { > + SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next);
this is wrong.. you cannot remove element from a *LIST when its iterated using *LIST_FOREACH. Use *LIST_FOREACH_SAFE instead... thnx for the patch! roman _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"