On Sat, 6 Jan 2007, Ceri Davies wrote:
So far it's happened this morning and yesterday morning. I haven't seen
it before that. I don't know the cause so I can't reproduce it at will,
but the logs don't give any indication. Chances are that it will happen
again tomorrow, but we'll see.
Hmm. It looks like you printf *(td->td_proc->p_fd->fd_ofiles) without the
array index. Could you repeat that, but with the array index -- i.e.,
td->td_proc->p_fd->fd_ofiles[uap->fd]? Also, it would probably be useful
to print uap->fd. Right now you're printing stdin (index 0), but if the
index is non-0, we want a different file.
Very tactfully put :) Sorry about that.
None of the uap->fd's seem to be valid. In the first case, uap->fd is way
too high for the length of fd_ofiles, which only has 21 elements:
(kgdb) up 8
#8 0xc04c470d in fstat (td=0xc2eeb180, uap=0xd610dc74) at
/usr/src/sys/kern/kern_descrip.c:1075
1075 error = kern_fstat(td, uap->fd, &ub);
(kgdb) p uap->fd
$1 = 89
(kgdb) p *td->td_proc->p_fd->fd_ofiles[uap->fd]
Cannot access memory at address 0x0
In the second, uap->fd is nonsense:
(kgdb) up 8
#8 0xc04c470d in fstat (td=0xc3109300, uap=0xd617ec74) at
/usr/src/sys/kern/kern_descrip.c:1075
1075 error = kern_fstat(td, uap->fd, &ub);
(kgdb) p uap->fd
$1 = -1023449232
(kgdb)
Hmm. So, I reviewed audit_arg_file() closely, and after staring at the code a
lot, couldn't see anything obvious in either the socket or the vnode/fifo
case. I did fix one other bug there, however, which can never actually be
exercised in 7-CURRENT, and is fairly unlikely in 6-STABLE, and will MFC that
in a week.
Could you try printing *td->td_ar? Maybe this will give us a clue as to how
far it got. In particular, this may be able to more reliably give us the file
descriptor number, which is audited early in the system call. You might find
that 'td' is corrupted in many layers of the stack, keep going up until you
find one where it's good. It may well be that td->td_ar->k_ar.ar_arg_fd is
correct, and might confirm that uap->fd is correct still. We'd like also to
know if ARG_SOCKINFO, ARG_VNODE1, or ARG_VNODE2 is set in the
k_ar.ar_valid_arg field. This may tell us some more about the file descriptor
even though it appears to have vanished.
I'm quite worried by the fact that the file descriptor seems not to be present
any more -- this suggests a file descriptor related race of the sort that is
both quite difficult to figure out and also quite a risk. It's strange that
it would only trigger with audit, however--perhaps audit stretches out the
race. Is this an SMP box?
Could you print the entire contents of *td->td_proc->p_fd?
Thanks,
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"