Nightly tests on our 6.1-based installation using pgsql have resulted in a number of kernel hangs, due to a corrupt semu_list (the list ended up with a loop).
It seems there are a few holes in the locking in the semaphore code. The issue we've encountered comes from semexit_myhook. It obtains a pointer to a list element after acquiring SEMUNDO_LOCK, and after dropping the lock manipulates the next pointer to remove the element from the list. The fix below solves our current problem. Any comments? --- RELENG_6/src/sys/kern/sysv_sem.c Tue Jun 7 01:03:27 2005 +++ swbuild_plt_boson/src/sys/kern/sysv_sem.c Tue Mar 6 16:13:45 2007 @@ -1259,16 +1259,17 @@ struct proc *p; { struct sem_undo *suptr; - struct sem_undo **supptr; /* * Go through the chain of undo vectors looking for one * associated with this process. */ SEMUNDO_LOCK(); - SLIST_FOREACH_PREVPTR(suptr, supptr, &semu_list, un_next) { - if (suptr->un_proc == p) + SLIST_FOREACH(suptr, &semu_list, un_next) { + if (suptr->un_proc == p) { + SLIST_REMOVE(&semu_list, suptr, sem_undo, un_next); break; + } } SEMUNDO_UNLOCK(); @@ -1328,8 +1329,9 @@ * Deallocate the undo vector. */ DPRINTF(("removing vector\n")); + SEMUNDO_LOCK(); suptr->un_proc = NULL; - *supptr = SLIST_NEXT(suptr, un_next); + SEMUNDO_UNLOCK(); } static int _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "[EMAIL PROTECTED]"