On Wed Aug 04 2010 at 13:21:07 +0000, Andrew Doran wrote: > On Sat, Jul 31, 2010 at 08:31:19PM +0300, Antti Kantee wrote: > > Hi, > > > > I'm looking at a KASSERT which is triggering quite rarely for me (in > > terms of iterations): > > > > panic: kernel diagnostic assertion "dt->dt_ff[i]->ff_refcnt == 0" failed: > > file > > "/usr/allsrc/src/sys/rump/librump/rumpkern/../../../kern/kern_descrip.c", > > line 856 > > > > Upon closer examination, it seems that this can trigger while another > > thread is in fd_getfile() between upping the refcount, testing for > > ff_file, and fd_putfile(). Removing the KASSERT seems to restore correct > > You're right there, the KASSERT() is wrong, it should be removed.
Thanks, I'll do that. > > operation, but I didn't read the code far enough to see where the race > > is actually handled and what stops the code from using the wrong file. > > FYI the fdfile_t (per-descriptor records) are stable for the lifetime of the > process, what each record descibes can and does of course change, and how > those records are pointed to does change (fdtab_t). > > There isn't really a concept of "wrong file", as in, the app gets > what it asked for. It is free to ask for the wrong thing, and it's free > to ask for the right thing at the wrong time, etc - that's its problem. > > Unless you're alluding to another bug? Not really. I just started thinking about how applications can make sure they use the right file descriptor. It seems using close() to notify other threads of a file descriptor being closed is racy. So something naiive like this: t1: lock t1: get fd1 t1: unlock /* t1 wants to do a syscall with fd1 but is preempted */ t2: lock t2: close fd1 t2: unlock t3: lock t3: open, result fd1 t3: unlock t1: syscall fd1 ... will give you the wrong result. Essentially there is no interlock from the application lookup to the kernel backing object lookup. So I guess if you want things to work correctly, instead of close() you need to dup2() to a zombie/"deadfs" fd and wait for all threads to check in before you can close it. (i assume dup2 is atomic) Never realized file descriptors and threads were so tricky ;)
