On Fri, 19 Sep 2008, Oleg V. Nauman wrote:
(1) Start by deleting all but one nameserver entry in /etc/resolv.conf.
Confirm that you can still reproduce the problem.
Due to various reasons my laptop running local caching DNS server ( named )
without any forwarders assigned. My /etc/resolv.conf contains nameserver
127.0.0.1
This is simplifying in some senses, but complicating in others. In
particular, the question it raises is whether the problem is in the DNS
resolver or the nameserver. Seeing a tcpdump of lo0 for DNS traffic would be
quite interesting, since we could look at timestamps and try to place the
blame a bit more precisely.
Could you
also use procstat -k on the dig process to generate a kernel stack trace
for it?
Let's add to this list: when the problem happens, could you also procstat -k
the name server process(es)?
And procstat -kk output for logger process waiting:
PID TID COMM TDNAME KSTACK
1421 100095 logger - mi_switch+0x2c8
sleepq_switch+0xd9 sleepq_catch_signals+0x239 sleepq_wait_sig+0x14
_sleep+0x35f pipe_read+0x389 dofileread+0x96 kern_readv+0x58 read+0x4f
syscall+0x2b3 Xint0x80_syscall+0x20
Interesting -- logger is blocked on reading from a pipe, likely standard
input. So it sounds like something else is failing to complete in a timely
manner -- perhaps due to DNS.
This is approximately the date of my last UDP MFC. Could you try backing
out just src/sys/netinet6/udp6_usrreq.c revision 1.81.2.7 and see if that
helps? (specifically, restore the use of sosend_generic instead of
sosend_dgram)
If you can show that it's definitely a problem with the change to sosend_dgram
for UDPv6 socket send, then it might suggest it's the same problem that it is
related to the UDPv46 code there. In which case I will propose we back out
that portion of the change in the 7-stable branch until it's known to be
resolved -- I don't want other people tripping over this.
Could you try compiling your kernel with WITNESS to see if we get any
extended debugging information?
Have added WITNESS ( and STACK required by procstat ) options but it is not
producing any output ( so no LORs or something like this )
OK. Could you try adding INVARIANT_SUPPORT and INVARIANTS if they aren't
there? Be aware: this may convert the wedging you are experiencing into a
kernel panic.
Is anybody experiencing the same issues with fresh RELENG_7? Unsure it is
my local issues though
I'm not experiencing them, but these sorts of things can be quite subtle
and workload-dependent.
Well experiencing this issue during the system boot even..
OK. So there must be something a bit different about your setup -- perhaps
there's something specific about the way things are interacting over the
loopback address for the name server. Is this the stock system BIND9 or
something else? Are you able to temporarily switch to an external name server
and see if that changes things?
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"