I decided to give vinum a try a few days ago and ran into the same problem as Vallo - after using it for a short period of time it caused a kernel panic due to a page fault.
I spent some time with kgdb today and believe that I have found the bug. On Thu, Mar 18, 1999 at 01:31:23PM +1030, Greg Lehey wrote: > This is a problem I've seen before, but it completely baffles me. The > request passed to launch_requests (frame 10) has been deallocated. > Some of the debug code I put in caught it: > > (kgdb) p freeinfo[7] > $2 = { > time = { > tv_sec = 921669613, > tv_usec = 289712 > }, > seq = 24, > size = 36, > line = 174, > address = 0xf0a3cb00 > "ÞÀÞh\235\"ðÞÀÞÀÈ£ðÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞÞÀÞ", > file = "vinuminterrupt.c" > } > > This was called from freerq, which frees the complete request. freerq > is called from only four places: one on completion of the request > (which in this case is just about to be started), one if the request > is aborted (which also sets bp->b_error, which is not set here), once > in a read context (which is not applicable here: it's a write), and > once just before the call to launch_requests in frame 11: The best that I can tell, the problem is with the first call that you listed: "on completion of the request". The function 'complete_rqe' is called asynchronously by an interrupt at the completion of the I/O request. > So where is this coming from? I'm completely baffled. It doesn't > happen to most people, though I have had reports of one or two other > cases. About the only clue is that the problem didn't occur when I > removed the debug memory allocator, but I don't know whether it went > away or into hiding. I'd really like to find out what's going on > here. I think that removing the debug memory allocator just made it go into hiding because it changed the timing of the code. Freeing the request structure in the interrupt routine is causing a race condition in the function 'launch_requests'. Interrupts must be disabled around any and all code which refers to the request chain and this wasn't being done. I have created a patch that seems to fix the problem. However, there could be other places in the code that refers to the request chain without disabling interrupts. After looking at it for only a few hours, I'm not familiar enough with it to tell. Hope this helps. Here's the patch: diff -u vinum/vinumrequest.c vinum-mod/vinumrequest.c --- vinum/vinumrequest.c Thu Mar 18 20:21:46 1999 +++ vinum-mod/vinumrequest.c Fri Mar 19 22:55:49 1999 @@ -258,13 +258,8 @@ biodone(bp); freerq(rq); return -1; - } { /* XXX */ - int result; - int s = splhigh(); - result = launch_requests(rq, reviveok); /* now start the requests if we can */ - splx(s); - return result; } + return launch_requests(rq, reviveok); /* now start the requests if we can */ } else /* * This is a write operation. We write to all @@ -366,6 +361,7 @@ if (debug & DEBUG_LASTREQS) logrq(loginfo_user_bpl, rq->bp, rq->bp); #endif + s = splbio(); for (rqg = rq->rqg; rqg != NULL; rqg = rqg->next) { /* through the whole request chain */ rqg->active = rqg->count; /* they're all active */ rq->active++; /* one more active request group */ @@ -396,13 +392,13 @@ logrq(loginfo_rqe, rqe, rq->bp); #endif /* fire off the request */ - s = splbio(); (*bdevsw[major(rqe->b.b_dev)]->d_strategy) (&rqe->b); - splx(s); } /* XXX Do we need caching? Think about this more */ } } + splx(s); + return 0; } I remove the splhigh/splx from around the first call of launch_requests because, as far as I can tell, it became redundant after adding splbio/splx around the for loop in the launch_requests function. ------ Russell Neeper Texas A&M University russell-nee...@tamu.edu Computing & Information Services Network Group To Unsubscribe: send mail to majord...@freebsd.org with "unsubscribe freebsd-current" in the body of the message