On Sun, Jan 07, 2007 at 11:49:56AM +0000, Robert Watson wrote: > On Sat, 6 Jan 2007, Ceri Davies wrote: > > >>>So far it's happened this morning and yesterday morning. I haven't seen > >>>it before that. I don't know the cause so I can't reproduce it at will, > >>>but the logs don't give any indication. Chances are that it will happen > >>>again tomorrow, but we'll see. > >> > >>Hmm. It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without > >>the array index. Could you repeat that, but with the array index -- > >>i.e., td->td_proc->p_fd->fd_ofiles[uap->fd]? Also, it would probably be > >>useful to print uap->fd. Right now you're printing stdin (index 0), but > >>if the index is non-0, we want a different file. > > > >Very tactfully put :) Sorry about that. > > > >None of the uap->fd's seem to be valid. In the first case, uap->fd is way > >too high for the length of fd_ofiles, which only has 21 elements: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=0xc2eeb180, uap=0xd610dc74) at > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075 error = kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 = 89 > >(kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd] > >Cannot access memory at address 0x0 > > > >In the second, uap->fd is nonsense: > > > >(kgdb) up 8 > >#8 0xc04c470d in fstat (td=0xc3109300, uap=0xd617ec74) at > >/usr/src/sys/kern/kern_descrip.c:1075 > >1075 error = kern_fstat(td, uap->fd, &ub); > >(kgdb) p uap->fd > >$1 = -1023449232 > >(kgdb) > > Hmm. So, I reviewed audit_arg_file() closely, and after staring at the > code a lot, couldn't see anything obvious in either the socket or the > vnode/fifo case. I did fix one other bug there, however, which can never > actually be exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, and > will MFC that in a week.
OK, thanks. > Could you try printing *td->td_ar? Maybe this will give us a clue as to > how far it got. In particular, this may be able to more reliably give us > the file descriptor number, which is audited early in the system call. You > might find that 'td' is corrupted in many layers of the stack, keep going > up until you find one where it's good. It may well be that > td->td_ar->k_ar.ar_arg_fd is correct, and might confirm that uap->fd is > correct still. We'd like also to know if ARG_SOCKINFO, ARG_VNODE1, or > ARG_VNODE2 is set in the k_ar.ar_valid_arg field. This may tell us some > more about the file descriptor even though it appears to have vanished. *td->td_ar is null (0x0) in both cases... > I'm quite worried by the fact that the file descriptor seems not to be > present any more -- this suggests a file descriptor related race of the > sort that is both quite difficult to figure out and also quite a risk. > It's strange that it would only trigger with audit, however--perhaps audit > stretches out the race. Is this an SMP box? It's certainly looking quite nasty. This system is UP hardware without options SMP. > Could you print the entire contents of *td->td_proc->p_fd? First case: (kgdb) p *td->td_proc->p_fd $2 = {fd_ofiles = 0xc3441000, fd_ofileflags = 0xc3441100 "", fd_cdir = 0xc367f110, fd_rdir = 0xc2ce2bb0, fd_jdir = 0x0, fd_nfiles = 64, fd_map = 0xc3b65970, fd_lastfile = 20, fd_freefile = 16, fd_cmask = 63, fd_refcnt = 1, fd_holdcnt = 1, fd_mtx = {mtx_object = { lo_class = 0xc06ad4c4, lo_name = 0xc067c0fd "filedesc structure", lo_type = 0xc067c0fd "filedesc structure", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, fd_locked = 0, fd_wanted = 0, fd_kqlist = {slh_first = 0x0}, fd_holdleaderscount = 0, fd_holdleaderswakeup = 0} Second case: (kgdb) p *td->td_proc->p_fd $2 = {fd_ofiles = 0xc2d23600, fd_ofileflags = 0xc2d23700 "", fd_cdir = 0xc31b8660, fd_rdir = 0xc2ce2bb0, fd_jdir = 0x0, fd_nfiles = 64, fd_map = 0xc2e9c1c0, fd_lastfile = 20, fd_freefile = 17, fd_cmask = 63, fd_refcnt = 1, fd_holdcnt = 1, fd_mtx = {mtx_object = { lo_class = 0xc06ad4c4, lo_name = 0xc067c0fd "filedesc structure", lo_type = 0xc067c0fd "filedesc structure", lo_flags = 196608, lo_list = {tqe_next = 0x0, tqe_prev = 0x0}, lo_witness = 0x0}, mtx_lock = 4, mtx_recurse = 0}, fd_locked = 0, fd_wanted = 0, fd_kqlist = {slh_first = 0x0}, fd_holdleaderscount = 0, fd_holdleaderswakeup = 0} If it's at all useful, I can provide access to this system and the dumps. Ceri -- That must be wonderful! I don't understand it at all. -- Moliere
pgpT6fmVvPA4c.pgp
Description: PGP signature