ntpd does not re-query servers, when a new interface appears
ntpd tracks interface updates, however it does not requery servers, when they occur. This was less than an hour ago, at my university, the notebook boots and is not connected to anything: 9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd 9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec 9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 0.0.0.0#123 Disabled 9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 Disabled 9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 192.168.1.12#123 Enabled 9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 Enabled 9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled 9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 Enabled 9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for interface updates 9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040 9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from /var/db/ntpd.drift 9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving up on it 9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving up on it 9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving up on it 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', giving up on it 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', giving up on it 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', giving up on it 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', giving up on it So ntpd has given up on all the servers listed in the ntp.conf file. I then proceed to connect to the wireless network and proceed to log into two VPNs: 9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, 192.168.75.58#123 Enabled 9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, 193.196.120.15#123 Enabled 9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 141.3.162.67#123 Enabled Over interface #8 some of the servers are actually available, but ntpq -p still states: No association ID's returned Only when I restart ntpd, it operates as expected: remote refid st t when poll reach delay offset jitter == zit-net2.uni-pa .STEP. 16 u- 51200.0000.000 0.000 alpha.rueckgr.a .STEP. 16 u- 51200.0000.000 0.000 ntp.goneco.de .STEP. 16 u- 51200.0000.000 0.000 +proxy4.rz.uni-k 129.13.64.17 2 u 30 128 2712.9372.530 1.891 +proxy2.rz.uni-k 129.13.64.17 2 u 58 128 3753.593 -8.981 1.837 *proxy1.rz.uni-k 129.13.64.17 2 u 15 128 2713.2978.244 1.487 -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: is dtrace usable?
Quoting John Baldwin (from Mon, 8 Mar 2010 10:00:12 -0500): On Saturday 06 March 2010 11:00:12 am Robert Watson wrote: On Sat, 6 Mar 2010, Alexander Leidinger wrote: >> Take a look at the DTrace configuration information here: >> >>http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dtrace.html > > I've just reread it (despite the fact that I already used it). Some > comments: > > Last time I tried, I didn't see any problems by adding > makeoptions WITH_CTF=yes > to the kernel config instead of doing > make WITH_CTF=1 kernel > > Did I miss something, and if not, shouldn't we tell about the > makeoptions part instead (a kernel rebuild later will not cause > trouble when someone forgets to do the WITH_CTF part as it is already > in the kernel makefile)? I'll leave John to answer this one, CC line broadended. I would be very surprised if 'makeoptions WITH_CTF=yes' worked. The many times I and others have tried it it did not work. Do you have a log of your build showing the ctfconvert and ctfmerge command lines? I do not have a log around, it has been a while since I did something with dtrace (a year ago) and I can not remember that I always added WITH_CTF on a build (but it was about SDT probes, not FBT probes, in case it matters). I had a look again, WITH_CTF=yes is one of the first lines in the Makefile, and /usr/share/mk/sys.mk has "if !defined(WITH_CTF)". "make -V WITH_CTF" shows "yes", but "make -V NO_CTF" shows "1". This is strange, isn't it? I would expect that NO_CTF is undefined. Is this a bug in make, or a bug in the man page (neither in the description of the different kinds of variables, nor in the description of "defined" is something mentioned explaining this behavior). The current kernel on my test machine is compiled with "makeoptions ...", and a "dtrace -l" causes the watchdog to trigger a panic. Ugh... that's not a nice behavior. :( Bye, Alexander. -- Adding sound to movies would be like putting lipstick on the Venus de Milo. -- actress Mary Pickford, 1925 http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Survey results very helpful, thanks! (was: Re: net.inet.tcp.timer_race: does anyone have a non-zero value?)
On 8 March 2010, at 12:33, Robert Watson wrote: > > On Mon, 8 Mar 2010, Doug Hardie wrote: > >> I run a number of 4 core systems with em interfaces. These are production >> systems that are unmanned and located a long way from me. Under unusual >> conditions it can take up to 6 hours to get there. I have been waiting to >> switch to 8.0 because of the discussions on the em device and now it sounds >> like I had better just skip 8.x and wait for 9. 7.2 is working just fine. > > Not sure that any information in this survey thread should be relevant to > that decision. This race has existed since before FreeBSD, having appeared > in the original BSD network stack, and is just as present in FreeBSD 7.x as > 8.x or 9.x. When I learned about the race during the early 7.x development > cycle, I added a counter/statistic to measure how much it happened in > practice, but was not able to exercise it in my testing, and so left the > counter in to appear in 7.0 and later so that we could perform this survey as > core counts/etc increase. > > The two likely outcomes were "it is never exercised" and "it is exercised but > only very infrequently", neither really justifying the quite complex change > to correct it given requirements at the time. On-going development work on > the virtual network stack is what justifies correcting the bug at this point, > moving from detecting and handling the race to preventing it from occuring as > an invariant. The motivation here, BTW, is that we'd like to eliminate the > type-stable storage requirement for connection state (which ensures that > memory once used for a connection block is only ever used for connection > blocks in the future), allowing memory to be fully freed when a virtual > network stack is destroyed. Using type-stable storage helped address this > bug, but was primarily present to reduce the overhead of monitoring using > netstat(1). We'll now need to use a slightly more expensive solution (true > reference counts) in that context, although in practice it will almost > certainly be an unmeasurable cost. > > Which is to say that while there might be something in the em/altq/... thread > to reasonably lead you to avoid 8.0, nothing in the TCP timer race thread > should do so, since it affects 7.2 just as much as 8.0. Even if you do see a > non-zero counter, that's not a matter for operational concern, just useful > from the perspective of a network stack developer to understanding timing and > behaviors in the stack. :-) Thanks for the complete explanation. I don't believe the ALTQ issue will affect me. I am not currently using it and do not expect to in the near future. In addition, there was a posting that a fix for at least part of that will be added in a week or so. Given all that it appears its time to start the planning/testing process for 8. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Fwd: Re: NFS Client error
Thanks for your kind reply, I'm forwarding it there... Original Message Subject:Re: NFS Client error Date: Mon, 08 Mar 2010 23:59:29 +0100 From: vol...@vwsoft.com To: Giulio Ferro CC: freebsd-hack...@freebsd.org, freebsd-...@freebsd.org On 03/08/10 12:16, Giulio Ferro wrote: Freebsd 8 stable amd64 It mounts different file systems by NFS (with locking) on a data server directly connected (gigabit) to the server Apache running in a several jails on those nfs folders. Now and then I get huge slow-down. When I look in the logs I get thousand of lines like these: Mar 5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 (httpd) Mar 5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on signal 11 What should I do? Giulio, it seems this is anyhow not related to network (nfs) operations. It's looking like a problem in the VM. I think it makes sense to have a look at the httpd.core file if the binary has been linked with debugging symbols turned on. Also I think at first, it may not hurt to look at vmstat -m output. You may want to change ${subject} and post to stable@ to drive more attention to your problem. Volker ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Fwd: Re: NFS Client error
> Thanks for your kind reply, I'm forwarding it there... > > > Original Message > Subject: Re: NFS Client error > Date: Mon, 08 Mar 2010 23:59:29 +0100 > From: vol...@vwsoft.com > To: Giulio Ferro > CC: freebsd-hack...@freebsd.org, freebsd-...@freebsd.org > > > > On 03/08/10 12:16, Giulio Ferro wrote: > > Freebsd 8 stable amd64 > > > > It mounts different file systems by NFS (with locking) on a > > data server directly connected (gigabit) to the server > > > > Apache running in a several jails on those nfs folders. > > > > Now and then I get huge slow-down. When I look in the logs > > I get thousand of lines like these: > > Mar 5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 (httpd) > > Mar 5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on > > signal 11 > > > > > > What should I do? If the binary (httpd) is on a nfs server, then if the binary got modified this is what usualy happens my 2c danny > > Giulio, > > it seems this is anyhow not related to network (nfs) operations. It's > looking like a problem in the VM. I think it makes sense to have a look > at the httpd.core file if the binary has been linked with debugging > symbols turned on. Also I think at first, it may not hurt to look at > vmstat -m output. > > You may want to change ${subject} and post to stable@ to drive more > attention to your problem. > > Volker > > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Many processes stuck in zfs
Over the past couple of months, I've more or less regularly observed machines having more and more processes stuck in the zfs wchan. The processes never recover from that, and trying to reboot only gets the entire system stuck, without any console messages. I can enter the debugger, and I have saved a couple of dumps. The situation seems to be triggered by zfs receive'ing snapshots from the sister machine (both synchronize their active ZFS filesystems to each other, using zfs send and zfs receive). It appears it's the receiving causing trouble. Both machines run 8-stable from mid-February, with a single-disk ZFS pool, with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. What should I be looking at to further diagnose? Thanks, Stefan -- Stefan BethkeFon +49 151 14070811 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS hot spares
On Mon, Mar 08, 2010 at 01:06:10PM -0500, Steve Polyack wrote: > ZFS in FreeBSD lacks at least one major feature from the Solaris > version: hot spares. There is a PR open at > http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been > any motion/thoughts posted on it since its creation almost one year ago. > > I'm aware that on Solaris, hot spare replacement is handled by a few > Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug > into the Solaris FMA (Fault Management Architecture). Have there been > any thoughts on porting these over or getting something similar running > within FreeBSD? With all of the recent SATA/SAS CAM hotplug work now > committed, it would be nice to have automatic replacement of hot spares > with a future hot-replacement of the failed drive. > > On the other side, I'd be interested in hearing if anyone has had > success in rolling their own scripted solution: i.e. something which > polls 'zpool status' looking for failed drives and performing hot-spare > replacements automatically. Currently FreeBSD's ZFS sends various events to devd. It should be possible to implement some scripts (or maybe reuse zfs-retire/zfs-diagnose?) to perform 'zpool replace' when disk disappears, etc. This shouldn't be very hard modulo bugs in FreeBSD/ZFS as this functionality, because unused, wasn't tested. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpMyPwqRnBn5.pgp Description: PGP signature
Re: Fwd: Re: NFS Client error
On 09.03.2010 10:14, Daniel Braniss wrote: Thanks for your kind reply, I'm forwarding it there... Original Message Subject:Re: NFS Client error Date: Mon, 08 Mar 2010 23:59:29 +0100 From: vol...@vwsoft.com To: Giulio Ferro CC: freebsd-hack...@freebsd.org, freebsd-...@freebsd.org On 03/08/10 12:16, Giulio Ferro wrote: Freebsd 8 stable amd64 It mounts different file systems by NFS (with locking) on a data server directly connected (gigabit) to the server Apache running in a several jails on those nfs folders. Now and then I get huge slow-down. When I look in the logs I get thousand of lines like these: Mar 5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 (httpd) Mar 5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on signal 11 What should I do? If the binary (httpd) is on a nfs server, then if the binary got modified this is what usualy happens Nope. The binary is on the jails on the local machine. Only the configuration dir (etc/apache22) and data dir (www) in on the nfs server. |NFS CLIENT | | jail 1 : httpd | | jail 2 : httpd | -->NFS SERVER | jail 3 : httpd | |... | --- Giulio. my 2c danny Giulio, it seems this is anyhow not related to network (nfs) operations. It's looking like a problem in the VM. I think it makes sense to have a look at the httpd.core file if the binary has been linked with debugging symbols turned on. Also I think at first, it may not hurt to look at vmstat -m output. You may want to change ${subject} and post to stable@ to drive more attention to your problem. Volker ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd does not re-query servers, when a new interface appears
On Tue, 9 Mar 2010, Dominic Fandrey wrote: > ntpd tracks interface updates, however it does not requery > servers, when they occur. This was less than an hour ago, > at my university, the notebook boots and is not connected > to anything: > > 9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd > 9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 0.0.0.0#123 > Disabled > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 > Disabled > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 192.168.1.12#123 > Enabled > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 > Enabled > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 > Enabled > 9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for > interface updates > 9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040 > 9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from > /var/db/ntpd.drift > 9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving up > on it > 9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving up > on it > 9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving up > on it > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > giving up on it > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > giving up on it > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', > giving up on it > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', > giving up on it > > So ntpd has given up on all the servers listed in the ntp.conf file. Yes, but it looks more like name service that's not operating, ntpd seems to be doing its best but can't resolve the hostnames? > I then proceed to connect to the wireless network and proceed to log > into two VPNs: > > 9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, > 192.168.75.58#123 Enabled > 9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, > 193.196.120.15#123 Enabled > 9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 141.3.162.67#123 > Enabled > > Over interface #8 some of the servers are actually available, but > ntpq -p still states: > No association ID's returned > > Only when I restart ntpd, it operates as expected: > remote refid st t when poll reach delay offset > jitter > == > zit-net2.uni-pa .STEP. 16 u- 51200.0000.000 > 0.000 > alpha.rueckgr.a .STEP. 16 u- 51200.0000.000 > 0.000 > ntp.goneco.de .STEP. 16 u- 51200.0000.000 > 0.000 > +proxy4.rz.uni-k 129.13.64.17 2 u 30 128 2712.9372.530 > 1.891 > +proxy2.rz.uni-k 129.13.64.17 2 u 58 128 3753.593 -8.981 > 1.837 > *proxy1.rz.uni-k 129.13.64.17 2 u 15 128 2713.2978.244 > 1.487 I've always had to restart named after losing / regaining an interface, most noticeably after a suspend/resume (eg a low battery suspend), so I run /etc/rc.d/named restart from rc.resume. This looks like a similar issue perhaps, though I don't see why restarting only ntpd would fix it. HTH, Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Many processes stuck in zfs
Le Tue, 9 Mar 2010 10:15:53 +0100, Stefan Bethke a écrit : > Over the past couple of months, I've more or less regularly observed > machines having more and more processes stuck in the zfs wchan. The > processes never recover from that, and trying to reboot only gets the > entire system stuck, without any console messages. I can enter the > debugger, and I have saved a couple of dumps. > > The situation seems to be triggered by zfs receive'ing snapshots from > the sister machine (both synchronize their active ZFS filesystems to > each other, using zfs send and zfs receive). It appears it's the > receiving causing trouble. > > Both machines run 8-stable from mid-February, with a single-disk ZFS > pool, with ARC limited to 512M, prefetch and ZIL disabled via > loader.conf. > > What should I be looking at to further diagnose? > > > Thanks, > Stefan > Hi, I encounter almost the same problem with a 8-STABLE build from the same time. When working a lot on files inside ~/, the directory get locked, any command trying to access it (from "ls" to application reading their configuration files) get stuck. The system is an amd64 desktop computer, 4GiB of memory and vfs.zfs.prefetch_disable is set to 0. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd does not re-query servers, when a new interface appears
On Tue, Mar 09, 2010 at 09:27:35PM +1100, Ian Smith wrote: > On Tue, 9 Mar 2010, Dominic Fandrey wrote: > > ntpd tracks interface updates, however it does not requery > > servers, when they occur. This was less than an hour ago, > > at my university, the notebook boots and is not connected > > to anything: > > > > 9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd > > 9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, > 0.0.0.0#123 Disabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 > Disabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, > 192.168.1.12#123 Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 > Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 > Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for > interface updates > > 9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040 > > 9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from > /var/db/ntpd.drift > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving > up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving > up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving > up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > giving up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > giving up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', > giving up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', > giving up on it > > > > So ntpd has given up on all the servers listed in the ntp.conf file. > > Yes, but it looks more like name service that's not operating, ntpd > seems to be doing its best but can't resolve the hostnames? > > > I then proceed to connect to the wireless network and proceed to log > > into two VPNs: > > > > 9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, > 192.168.75.58#123 Enabled > > 9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, > 193.196.120.15#123 Enabled > > 9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, > 141.3.162.67#123 Enabled > > > > Over interface #8 some of the servers are actually available, but > > ntpq -p still states: > > No association ID's returned > > > > Only when I restart ntpd, it operates as expected: > > remote refid st t when poll reach delay offset > jitter > > > == > > zit-net2.uni-pa .STEP. 16 u- 51200.0000.000 > 0.000 > > alpha.rueckgr.a .STEP. 16 u- 51200.0000.000 > 0.000 > > ntp.goneco.de .STEP. 16 u- 51200.0000.000 > 0.000 > > +proxy4.rz.uni-k 129.13.64.17 2 u 30 128 2712.9372.530 > 1.891 > > +proxy2.rz.uni-k 129.13.64.17 2 u 58 128 3753.593 -8.981 > 1.837 > > *proxy1.rz.uni-k 129.13.64.17 2 u 15 128 2713.2978.244 > 1.487 > > I've always had to restart named after losing / regaining an interface, > most noticeably after a suspend/resume (eg a low battery suspend), so I > run /etc/rc.d/named restart from rc.resume. This looks like a similar > issue perhaps, though I don't see why restarting only ntpd would fix it. named is supposed to auto-probe for interfaces at a specific interval; see the "interface-interval" option. I forget what the default is, but on our servers we explicitly disable it by setting it to 0. -- | Jeremy Chadwick j...@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Many processes stuck in zfs
On 2010-Mar-09 10:15:53 +0100, Stefan Bethke wrote: >Over the past couple of months, I've more or less regularly observed machines >having more and more processes stuck in the zfs wchan. The processes never >recover from that, How long have you waited? There seems to be a problem with low free memory handling that causes ZFS to turn into cold molasses. The work-around is to run a program that allocates a decent size chunk of memory and then exits. The original suggestion was something like: perl -e '@x = (0) x 100;' I've written a short program that allocates and dirties ~100MB and then exits and run it from cron. -- Peter Jeremy pgpj4AtT47MPd.pgp Description: PGP signature
freebsd 7.2stable em0: discard frame w/o packet header
The system panic with error kernel: em0: discard frame w/o packet header Crach dump one kgdb /boot/kernel/kernel /var/crash/vmcore.5 Fatal trap 12: page fault while in kernel mode cpuid = 0; apic id = 00 fault virtual address = 0xd9c6f105 fault code = supervisor read, page not present instruction pointer = 0x20:0x808a021f stack pointer = 0x28:0x85909c10 frame pointer = 0x28:0x85909c44 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 14 (swi4: clock sio) trap number = 12 panic: page fault cpuid = 0 Uptime: 11h51m2s Physical memory: 1936 MB Dumping 331 MB: 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 60 44 28 12 (kgdb) where #0 doadump () at pcpu.h:196 #1 0x8069a618 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0x8069a8f5 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0x8095f294 in trap_fatal (frame=0x85909bd0, eva=3653693701) at /usr/src/sys/i386/i386/trap.c:950 #4 0x8095f4fd in trap_pfault (frame=0x85909bd0, usermode=0, eva=3653693701) at /usr/src/sys/i386/i386/trap.c:863 #5 0x8095febd in trap (frame=0x85909bd0) at /usr/src/sys/i386/i386/trap.c:541 #6 0x8094479b in calltrap () at /usr/src/sys/i386/i386/exception.s:166 #7 0x808a021f in uma_zfree_arg (zone=0x86a3b000, item=0x8cdcd580, udata=0x868fa000) at /usr/src/sys/vm/uma_core.c:2253 #8 0x8075492b in ng_netflow_expire (arg=0x868fa000) at /usr/src/sys/netgraph/netflow/netflow.c:215 #9 0x806ac441 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274 #10 0x806789c9 in ithread_loop (arg=0x85c9d3d0) at /usr/src/sys/kern/kern_intr.c:1181 #11 0x80675271 in fork_exit (callout=0x8067882d , arg=0x85c9d3d0, frame=0x85909d38) at /usr/src/sys/kern/kern_fork.c:811 #12 0x80944810 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:271 Crach dump two kgdb /boot/kernel.old/kernel /var/crash/vmcore.4 Fatal trap 12: page fault while in kernel mode cpuid = 1; apic id = 01 fault virtual address = 0xc fault code = supervisor read, page not present instruction pointer = 0x20:0x806ec864 stack pointer = 0x28:0x85912a64 frame pointer = 0x28:0x85912a90 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, def32 1, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 3 (ng_queue1) trap number = 12 panic: page fault cpuid = 1 Uptime: 12h4m33s Physical memory: 1936 MB Dumping 154 MB: 139 123 107 91 75 59 43 27 11 (kgdb) where #0 doadump () at pcpu.h:196 #1 0x8069a618 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418 #2 0x8069a8f5 in panic (fmt=Variable "fmt" is not available. ) at /usr/src/sys/kern/kern_shutdown.c:574 #3 0x8095f294 in trap_fatal (frame=0x85912a24, eva=12) at /usr/src/sys/i386/i386/trap.c:950 #4 0x8095f4fd in trap_pfault (frame=0x85912a24, usermode=0, eva=12) at /usr/src/sys/i386/i386/trap.c:863 #5 0x8095febd in trap (frame=0x85912a24) at /usr/src/sys/i386/i386/trap.c:541 #6 0x8094479b in calltrap () at /usr/src/sys/i386/i386/exception.s:166 #7 0x806ec864 in m_copym (m=0x0, off0=1496, len=1496, wait=1) at /usr/src/sys/kern/uipc_mbuf.c:539 #8 0x807912a3 in ip_fragment (ip=0x862146d6, m_frag=0x85912b5c, mtu=1500, if_hwassist_flags=38, sw_csum=1) at /usr/src/sys/netinet/ip_output.c:731 #9 0x80791f4d in ip_output (m=0x86214600, opt=0x0, ro=0x85912b30, flags=32, imo=0x0, inp=0x88518bf4) at /usr/src/sys/netinet/ip_output.c:570 #10 0x8079332f in rip_output (m=0x86341d00, so=0x87a86d00, dst=3540521226) at /usr/src/sys/netinet/raw_ip.c:408 #11 0x807933f4 in rip_send (so=0x87a86d00, flags=0, m=0x86341d00, nam=0x0, control=0x0, td=0x85cff480) at /usr/src/sys/netinet/raw_ip.c:880 #12 0x806f590d in sosend_generic (so=0x87a86d00, addr=0x0, uio=0x0, top=0x86341d00, control=0x0, flags=0, td=0x85cff480) at /usr/src/sys/kern/uipc_socket.c:1243 #13 0x806f1767 in sosend (so=0x87a86d00, addr=0x0, uio=0x0, top=0x86341d00, control=0x0, flags=0, td=0x85cff480) at /usr/src/sys/kern/uipc_socket.c:1285 #14 0x807652e4 in ng_ksocket_rcvdata (hook=0x89164180, item=0x86925a50) at /usr/src/sys/netgraph/ng_ksocket.c:927 #15 0x8075986f in ng_apply_item (node=0x88f33900, item=0x86925a50, rw=0) at /usr/src/sys/netgraph/ng_base.c:2336 #16 0x8075acf5 in ngthread (arg=0x0) at /usr/src/sys/netgraph/ng_base.c:3304 #17 0x80675271 in fork_exit (callout=0x8075aaa9 , arg=0x0, frame=0x85912d38) at /usr/src/sys/kern/kern_fork.c:811 #18 0x80944810 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:271 -- реклама --- Акция! При покупке хостинга больше места и домен в подарок. http://FREEhost.UA __
Re: Many processes stuck in zfs
On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: > Over the past couple of months, I've more or less regularly observed machines > having more and more processes stuck in the zfs wchan. The processes never > recover from that, and trying to reboot only gets the entire system stuck, > without any console messages. I can enter the debugger, and I have saved a > couple of dumps. > > The situation seems to be triggered by zfs receive'ing snapshots from the > sister machine (both synchronize their active ZFS filesystems to each other, > using zfs send and zfs receive). It appears it's the receiving causing > trouble. > > Both machines run 8-stable from mid-February, with a single-disk ZFS pool, > with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. > > What should I be looking at to further diagnose? What kind of hardware do you have there? There is 3-way deadlock I've a fix for which would be hard to trigger on single or dual core machines. Feel free to try the fix: http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpS3cNrs3nSh.pgp Description: PGP signature
Re: ntpd does not re-query servers, when a new interface appears
On Tue, Mar 09, 2010 at 02:17:48PM +0100, Dominic Fandrey wrote: > On 09/03/2010 11:27, Ian Smith wrote: > > On Tue, 9 Mar 2010, Dominic Fandrey wrote: > > > ntpd tracks interface updates, however it does not requery > > > servers, when they occur. This was less than an hour ago, > > > at my university, the notebook boots and is not connected > > > to anything: > > > > > > 9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd > > > 9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, > > 0.0.0.0#123 Disabled > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 > > Disabled > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, > > 192.168.1.12#123 Enabled > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 > > Enabled > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 > > Enabled > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 > > Enabled > > > 9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for > > interface updates > > > 9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040 > > > 9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from > > /var/db/ntpd.drift > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving > > up on it > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving > > up on it > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving > > up on it > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > > giving up on it > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > > giving up on it > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', > > giving up on it > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', > > giving up on it > > > > > > So ntpd has given up on all the servers listed in the ntp.conf file. > > > > Yes, but it looks more like name service that's not operating, ntpd > > seems to be doing its best but can't resolve the hostnames? > > Why would I have named running on a notebook? This is a notebook, > which is not connected to the internet. > > > > I then proceed to connect to the wireless network and proceed to log > > > into two VPNs: > > > > > > 9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, > > 192.168.75.58#123 Enabled > > > 9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, > > 193.196.120.15#123 Enabled > > > 9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, > > 141.3.162.67#123 Enabled > > > > > > Over interface #8 some of the servers are actually available, but > > > ntpq -p still states: > > > No association ID's returned > > > > > > Only when I restart ntpd, it operates as expected: > > > remote refid st t when poll reach delay offset > > jitter > > > > > == > > > zit-net2.uni-pa .STEP. 16 u- 51200.0000.000 > > 0.000 > > > alpha.rueckgr.a .STEP. 16 u- 51200.0000.000 > > 0.000 > > > ntp.goneco.de .STEP. 16 u- 51200.0000.000 > > 0.000 > > > +proxy4.rz.uni-k 129.13.64.17 2 u 30 128 2712.9372.530 > > 1.891 > > > +proxy2.rz.uni-k 129.13.64.17 2 u 58 128 3753.593 -8.981 > > 1.837 > > > *proxy1.rz.uni-k 129.13.64.17 2 u 15 128 2713.2978.244 > > 1.487 > > > > I've always had to restart named after losing / regaining an interface, > > most noticeably after a suspend/resume (eg a low battery suspend), so I > > run /etc/rc.d/named restart from rc.resume. This looks like a similar > > issue perhaps, though I don't see why restarting only ntpd would fix it. > > As I said, named doesn't run at all. When the notebook gets an > internet connection, ntpd recognizes this. It somehow doesn't > occur to it, though, that it might be able to resolve the > servers, now. I believe this is the problem. Note that you'll need to add an SSL cert. exception for this site due to them using self-signed certs. https://support.ntp.org/bugs/show_bug.cgi?id=987 -- | Jeremy Chadwick j...@
Re: Many processes stuck in zfs
Am 09.03.2010 um 11:53 schrieb Peter Jeremy: > On 2010-Mar-09 10:15:53 +0100, Stefan Bethke wrote: >> Over the past couple of months, I've more or less regularly observed >> machines having more and more processes stuck in the zfs wchan. The >> processes never recover from that, > > How long have you waited? Many hours, sometimes up to 48 hours (when I didn't notice the stuck processes at first). > There seems to be a problem with low free memory handling that causes ZFS > to turn into cold molasses. The work-around is to run a program that > allocates a decent size chunk of memory and then exits. The original > suggestion was something like: > perl -e '@x = (0) x 100;' > I've written a short program that allocates and dirties ~100MB and then > exits and run it from cron. I'll try that the next time I encounter the stuck processes. I'm recording ZFS ARC stats with munin, would I be able to identify such a low memory situation from there? Would it make sense to monitor other stats? Thanks, Stefan -- Stefan BethkeFon +49 151 14070811 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Many processes stuck in zfs
On Mar 9, 2010, at 1:58 PM, Pawel Jakub Dawidek wrote: >>> What kind of hardware do you have there? There is 3-way deadlock I've a >>> fix for which would be hard to trigger on single or dual core machines. >>> >>> Feel free to try the fix: >>> >>> http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch >> >> Maybe related to the deadlock I reported when I was receiving an incremental >> snapshot while the target dataset was being read? > > Could be. This deadlock is in general related to zfs recv functionality. Aye aye, Sir set fingers -position crossed testing :) Borja ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: is dtrace usable?
On Tuesday 09 March 2010 3:27:09 am Alexander Leidinger wrote: > Quoting John Baldwin (from Mon, 8 Mar 2010 10:00:12 -0500): > > > On Saturday 06 March 2010 11:00:12 am Robert Watson wrote: > >> On Sat, 6 Mar 2010, Alexander Leidinger wrote: > >> > >> >> Take a look at the DTrace configuration information here: > >> >> > >> >>http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dtrace.html > >> > > >> > I've just reread it (despite the fact that I already used it). Some > >> > comments: > >> > > >> > Last time I tried, I didn't see any problems by adding > >> > makeoptions WITH_CTF=yes > >> > to the kernel config instead of doing > >> > make WITH_CTF=1 kernel > >> > > >> > Did I miss something, and if not, shouldn't we tell about the > >> > makeoptions part instead (a kernel rebuild later will not cause > >> > trouble when someone forgets to do the WITH_CTF part as it is already > >> > in the kernel makefile)? > >> > >> I'll leave John to answer this one, CC line broadended. > > > > I would be very surprised if 'makeoptions WITH_CTF=yes' worked. The many > > times I and others have tried it it did not work. Do you have a log of your > > build showing the ctfconvert and ctfmerge command lines? > > I do not have a log around, it has been a while since I did something > with dtrace (a year ago) and I can not remember that I always added > WITH_CTF on a build (but it was about SDT probes, not FBT probes, in > case it matters). > > I had a look again, WITH_CTF=yes is one of the first lines in the > Makefile, and /usr/share/mk/sys.mk has "if !defined(WITH_CTF)". "make > -V WITH_CTF" shows "yes", but "make -V NO_CTF" shows "1". This is > strange, isn't it? I would expect that NO_CTF is undefined. Is this a > bug in make, or a bug in the man page (neither in the description of > the different kinds of variables, nor in the description of "defined" > is something mentioned explaining this behavior). It is defined behavior. From the 2nd and 3rd paragraphs of the make(1) manual page: First of all, the initial list of specifications will be read from the system makefile, sys.mk, unless inhibited with the -r option. The stan- dard sys.mk as shipped with FreeBSD also handles make.conf(5), the default path to which can be altered via the make variable __MAKE_CONF. Then the first of BSDmakefile, makefile, and Makefile that can be found in the current directory, object directory (see .OBJDIR), or search path (see the -I option) will be read for the main list of dependency specifi- cations. A different makefile or list of them can be supplied via the -f option(s). Finally, if the file .depend can be found in any of the aforesaid locations, it will also be read (see mkdep(1)). From this you can see that sys.mk is included and parsed before 'Makefile', so the WITH_CTF=yes is not set until after sys.mk has been parsed. -- John Baldwin ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd does not re-query servers, when a new interface appears
On 09/03/2010 11:27, Ian Smith wrote: > On Tue, 9 Mar 2010, Dominic Fandrey wrote: > > ntpd tracks interface updates, however it does not requery > > servers, when they occur. This was less than an hour ago, > > at my university, the notebook boots and is not connected > > to anything: > > > > 9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd > > 9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, > 0.0.0.0#123 Disabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 > Disabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, > 192.168.1.12#123 Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 > Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 > Enabled > > 9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for > interface updates > > 9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040 > > 9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from > /var/db/ntpd.drift > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving > up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving > up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving > up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > giving up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', > giving up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', > giving up on it > > 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', > giving up on it > > > > So ntpd has given up on all the servers listed in the ntp.conf file. > > Yes, but it looks more like name service that's not operating, ntpd > seems to be doing its best but can't resolve the hostnames? Why would I have named running on a notebook? This is a notebook, which is not connected to the internet. > > I then proceed to connect to the wireless network and proceed to log > > into two VPNs: > > > > 9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, > 192.168.75.58#123 Enabled > > 9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, > 193.196.120.15#123 Enabled > > 9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, > 141.3.162.67#123 Enabled > > > > Over interface #8 some of the servers are actually available, but > > ntpq -p still states: > > No association ID's returned > > > > Only when I restart ntpd, it operates as expected: > > remote refid st t when poll reach delay offset > jitter > > > == > > zit-net2.uni-pa .STEP. 16 u- 51200.0000.000 > 0.000 > > alpha.rueckgr.a .STEP. 16 u- 51200.0000.000 > 0.000 > > ntp.goneco.de .STEP. 16 u- 51200.0000.000 > 0.000 > > +proxy4.rz.uni-k 129.13.64.17 2 u 30 128 2712.9372.530 > 1.891 > > +proxy2.rz.uni-k 129.13.64.17 2 u 58 128 3753.593 -8.981 > 1.837 > > *proxy1.rz.uni-k 129.13.64.17 2 u 15 128 2713.2978.244 > 1.487 > > I've always had to restart named after losing / regaining an interface, > most noticeably after a suspend/resume (eg a low battery suspend), so I > run /etc/rc.d/named restart from rc.resume. This looks like a similar > issue perhaps, though I don't see why restarting only ntpd would fix it. As I said, named doesn't run at all. When the notebook gets an internet connection, ntpd recognizes this. It somehow doesn't occur to it, though, that it might be able to resolve the servers, now. -- A: Because it fouls the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Many processes stuck in zfs
Am 09.03.2010 um 13:29 schrieb Pawel Jakub Dawidek: > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: >> Over the past couple of months, I've more or less regularly observed >> machines having more and more processes stuck in the zfs wchan. The >> processes never recover from that, and trying to reboot only gets the entire >> system stuck, without any console messages. I can enter the debugger, and I >> have saved a couple of dumps. >> >> The situation seems to be triggered by zfs receive'ing snapshots from the >> sister machine (both synchronize their active ZFS filesystems to each other, >> using zfs send and zfs receive). It appears it's the receiving causing >> trouble. >> >> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, >> with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. >> >> What should I be looking at to further diagnose? > > What kind of hardware do you have there? There is 3-way deadlock I've a > fix for which would be hard to trigger on single or dual core machines. FreeBSD lokschuppen.zs64.net 8.0-STABLE FreeBSD 8.0-STABLE #24: Sat Feb 13 11:20:03 UTC 2010 r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT amd64 Copyrig ht (c) 1992-2010 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD is a registered trademark of The FreeBSD Foundation. FreeBSD 8.0-STABLE #24: Sat Feb 13 11:20:03 UTC 2010 r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT amd64 Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: Intel(R) Core(TM)2 Duo CPU E7300 @ 2.66GHz (2666.65-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x10676 Stepping = 6 Features=0xbfebfbff Features2=0x8e39d AMD Features=0x20100800 AMD Features2=0x1 TSC: P-state invariant real memory = 4294967296 (4096 MB) avail memory = 4081422336 (3892 MB) > Feel free to try the fix: > > http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch I'll give it a shot on one of the two boxes. Stefan -- Stefan BethkeFon +49 151 14070811 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Many processes stuck in zfs
On Tue, Mar 09, 2010 at 01:57:07PM +0100, Borja Marcos wrote: > > On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote: > > > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: > >> Over the past couple of months, I've more or less regularly observed > >> machines having more and more processes stuck in the zfs wchan. The > >> processes never recover from that, and trying to reboot only gets the > >> entire system stuck, without any console messages. I can enter the > >> debugger, and I have saved a couple of dumps. > >> > >> The situation seems to be triggered by zfs receive'ing snapshots from the > >> sister machine (both synchronize their active ZFS filesystems to each > >> other, using zfs send and zfs receive). It appears it's the receiving > >> causing trouble. > >> > >> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, > >> with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. > >> > >> What should I be looking at to further diagnose? > > > > What kind of hardware do you have there? There is 3-way deadlock I've a > > fix for which would be hard to trigger on single or dual core machines. > > > > Feel free to try the fix: > > > > http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch > > Maybe related to the deadlock I reported when I was receiving an incremental > snapshot while the target dataset was being read? Could be. This deadlock is in general related to zfs recv functionality. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpWv81oJw3Zu.pgp Description: PGP signature
Re: Many processes stuck in zfs
On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote: > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote: >> Over the past couple of months, I've more or less regularly observed >> machines having more and more processes stuck in the zfs wchan. The >> processes never recover from that, and trying to reboot only gets the entire >> system stuck, without any console messages. I can enter the debugger, and I >> have saved a couple of dumps. >> >> The situation seems to be triggered by zfs receive'ing snapshots from the >> sister machine (both synchronize their active ZFS filesystems to each other, >> using zfs send and zfs receive). It appears it's the receiving causing >> trouble. >> >> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, >> with ARC limited to 512M, prefetch and ZIL disabled via loader.conf. >> >> What should I be looking at to further diagnose? > > What kind of hardware do you have there? There is 3-way deadlock I've a > fix for which would be hard to trigger on single or dual core machines. > > Feel free to try the fix: > > http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch Maybe related to the deadlock I reported when I was receiving an incremental snapshot while the target dataset was being read? Borja. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: is dtrace usable?
Quoting John Baldwin (from Tue, 9 Mar 2010 07:47:00 -0500): On Tuesday 09 March 2010 3:27:09 am Alexander Leidinger wrote: Quoting John Baldwin (from Mon, 8 Mar 2010 10:00:12 -0500): > On Saturday 06 March 2010 11:00:12 am Robert Watson wrote: >> On Sat, 6 Mar 2010, Alexander Leidinger wrote: >> >> >> Take a look at the DTrace configuration information here: >> >> >> >> http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dtrace.html >> > >> > I've just reread it (despite the fact that I already used it). Some >> > comments: >> > >> > Last time I tried, I didn't see any problems by adding >> > makeoptions WITH_CTF=yes >> > to the kernel config instead of doing >> > make WITH_CTF=1 kernel >> > >> > Did I miss something, and if not, shouldn't we tell about the >> > makeoptions part instead (a kernel rebuild later will not cause >> > trouble when someone forgets to do the WITH_CTF part as it is already >> > in the kernel makefile)? >> >> I'll leave John to answer this one, CC line broadended. > > I would be very surprised if 'makeoptions WITH_CTF=yes' worked. The many > times I and others have tried it it did not work. Do you have a log of your > build showing the ctfconvert and ctfmerge command lines? I do not have a log around, it has been a while since I did something with dtrace (a year ago) and I can not remember that I always added WITH_CTF on a build (but it was about SDT probes, not FBT probes, in case it matters). I had a look again, WITH_CTF=yes is one of the first lines in the Makefile, and /usr/share/mk/sys.mk has "if !defined(WITH_CTF)". "make -V WITH_CTF" shows "yes", but "make -V NO_CTF" shows "1". This is strange, isn't it? I would expect that NO_CTF is undefined. Is this a bug in make, or a bug in the man page (neither in the description of the different kinds of variables, nor in the description of "defined" is something mentioned explaining this behavior). It is defined behavior. From the 2nd and 3rd paragraphs of the make(1) manual page: First of all, the initial list of specifications will be read from the system makefile, sys.mk, unless inhibited with the -r option. The stan- dard sys.mk as shipped with FreeBSD also handles make.conf(5), the default path to which can be altered via the make variable __MAKE_CONF. Then the first of BSDmakefile, makefile, and Makefile that can be found in the current directory, object directory (see .OBJDIR), or search path (see the -I option) will be read for the main list of dependency specifi- cations. A different makefile or list of them can be supplied via the -f option(s). Finally, if the file .depend can be found in any of the aforesaid locations, it will also be read (see mkdep(1)). From this you can see that sys.mk is included and parsed before 'Makefile', so the WITH_CTF=yes is not set until after sys.mk has been parsed. I think we need to find a different solution for this. The need to specify WITH_CTF at the command line is very error prone. :( Bye, Alexander. -- Every time I lose weight, it finds me again! http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7 http://www.FreeBSD.org netchild @ FreeBSD.org : PGP ID = 72077137 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd does not re-query servers, when a new interface appears
On Tue, Mar 09, 2010 at 05:30:45AM -0800, Jeremy Chadwick wrote: > On Tue, Mar 09, 2010 at 02:17:48PM +0100, Dominic Fandrey wrote: > > On 09/03/2010 11:27, Ian Smith wrote: > > > On Tue, 9 Mar 2010, Dominic Fandrey wrote: > > > > ntpd tracks interface updates, however it does not requery > > > > servers, when they occur. This was less than an hour ago, > > > > at my university, the notebook boots and is not connected > > > > to anything: > > > > > > > > 9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd > > > > 9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec > > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, > > > 0.0.0.0#123 Disabled > > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 > > > Disabled > > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, > > > 192.168.1.12#123 Enabled > > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 > > > Enabled > > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 > > > Enabled > > > > 9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, > > > 127.0.0.1#123 Enabled > > > > 9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for > > > interface updates > > > > 9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040 > > > > 9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from > > > /var/db/ntpd.drift > > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org > > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', > > > giving up on it > > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org > > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', > > > giving up on it > > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org > > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', > > > giving up on it > > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: > > > ntp1.rz.uni-karlsruhe.de > > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve > > > `ntp1.rz.uni-karlsruhe.de', giving up on it > > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: > > > ntp1.rz.uni-karlsruhe.de > > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve > > > `ntp1.rz.uni-karlsruhe.de', giving up on it > > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: > > > ntp3.rz.uni-karlsruhe.de > > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve > > > `ntp3.rz.uni-karlsruhe.de', giving up on it > > > > 9 Mar 08:07:20 ntpd[1542]: host name not found: > > > ntp4.rz.uni-karlsruhe.de > > > > 9 Mar 08:07:20 ntpd[1542]: couldn't resolve > > > `ntp4.rz.uni-karlsruhe.de', giving up on it > > > > > > > > So ntpd has given up on all the servers listed in the ntp.conf file. > > > > > > Yes, but it looks more like name service that's not operating, ntpd > > > seems to be doing its best but can't resolve the hostnames? > > > > Why would I have named running on a notebook? This is a notebook, > > which is not connected to the internet. > > > > > > I then proceed to connect to the wireless network and proceed to log > > > > into two VPNs: > > > > > > > > 9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, > > > 192.168.75.58#123 Enabled > > > > 9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, > > > 193.196.120.15#123 Enabled > > > > 9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, > > > 141.3.162.67#123 Enabled > > > > > > > > Over interface #8 some of the servers are actually available, but > > > > ntpq -p still states: > > > > No association ID's returned > > > > > > > > Only when I restart ntpd, it operates as expected: > > > > remote refid st t when poll reach delay offset > > > jitter > > > > > > > == > > > > zit-net2.uni-pa .STEP. 16 u- 51200.0000.000 > > > 0.000 > > > > alpha.rueckgr.a .STEP. 16 u- 51200.0000.000 > > > 0.000 > > > > ntp.goneco.de .STEP. 16 u- 51200.0000.000 > > > 0.000 > > > > +proxy4.rz.uni-k 129.13.64.17 2 u 30 128 2712.9372.530 > > > 1.891 > > > > +proxy2.rz.uni-k 129.13.64.17 2 u 58 128 3753.593 -8.981 > > > 1.837 > > > > *proxy1.rz.uni-k 129.13.64.17 2 u 15 128 2713.2978.244 > > > 1.487 > > > > > > I've always had to restart named after losing / regaining an interface, > > > most noticeably after a suspend/resume (eg a low battery suspend), so I > > > run /etc/rc.d/named restart from rc.resume. This looks like a similar > > > issue perhaps, though I don't see why restarting only ntpd would fix it. > > > > As I said, named doesn't run at all. When the notebook gets an > > internet connection, ntpd recognizes this. It somehow doesn't > > occur to it, though, that it might
Re: is dtrace usable?
On Mar 9, 2010, at 2:16 PM, Alexander Leidinger wrote: >> From this you can see that sys.mk is included and parsed before 'Makefile', >> so the WITH_CTF=yes is not set until after sys.mk has been parsed. > > I think we need to find a different solution for this. The need to specify > WITH_CTF at the command line is very error prone. :( You are neither the first person to have made this observation, nor the first person to have failed to propose a solution in the form of a patch :-). Robert___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ntpd does not re-query servers, when a new interface appears
On Tue, 9 Mar 2010, Jeremy Chadwick wrote: > On Tue, Mar 09, 2010 at 09:27:35PM +1100, Ian Smith wrote: [..] > > Yes, but it looks more like name service that's not operating, ntpd > > seems to be doing its best but can't resolve the hostnames? Right smell, wrong pooch :) Thanks for the pointer to the ntp buglist. > > I've always had to restart named after losing / regaining an interface, > > most noticeably after a suspend/resume (eg a low battery suspend), so I > > run /etc/rc.d/named restart from rc.resume. This looks like a similar > > issue perhaps, though I don't see why restarting only ntpd would fix it. > > named is supposed to auto-probe for interfaces at a specific interval; > see the "interface-interval" option. I forget what the default is, > but on our servers we explicitly disable it by setting it to 0. // We have no dynamic interfaces, so BIND shouldn't need to // poll for interface state {UP|DOWN}. // (will this fix need to reload after suspend/resume?) interface-interval 0; It's rare, maybe twice a year, that this laptop cum server suspends from a 2hr+ lack of power - inverter failures, rewiring etc - so restarting named on resume makes more sense than constant iface polling 'in case'. cheers, Ian ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ZFS hot spares
On 03/09/10 05:11, Ivan Voras wrote: On 03/08/10 19:06, Steve Polyack wrote: ZFS in FreeBSD lacks at least one major feature from the Solaris version: hot spares. There is a PR open at http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been any motion/thoughts posted on it since its creation almost one year ago. I'm aware that on Solaris, hot spare replacement is handled by a few Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug into the Solaris FMA (Fault Management Architecture). Have there been any thoughts on porting these over or getting something similar running within FreeBSD? With all of the recent SATA/SAS CAM hotplug work now committed, it would be nice to have automatic replacement of hot spares with a future hot-replacement of the failed drive. On the other side, I'd be interested in hearing if anyone has had success in rolling their own scripted solution: i.e. something which polls 'zpool status' looking for failed drives and performing hot-spare replacements automatically. You don't have to exactly poll it. See /etc/devd.conf: # Sample ZFS problem reports handling. notify 10 { match "system" "ZFS"; match "type""zpool"; action "logger -p kern.err 'ZFS: failed to load zpool $pool'"; }; notify 10 { match "system" "ZFS"; match "type""vdev"; action "logger -p kern.err 'ZFS: vdev failure, zpool=$pool type=$type'"; }; notify 10 { match "system" "ZFS"; match "type""data"; action "logger -p kern.warn 'ZFS: zpool I/O failure, zpool=$pool error=$zio_err'"; }; notify 10 { match "system" "ZFS"; match "type""io"; action "logger -p kern.warn 'ZFS: vdev I/O failure, zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size error=$zio_err'"; }; notify 10 { match "system" "ZFS"; match "type""checksum"; action "logger -p kern.warn 'ZFS: checksum mismatch, zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'"; }; I don't really know if these notifications actually work since I don't have hot-plug test machines, but if they do, this looks like a decent starting point. Thanks for the suggestions. I received a similar one from someone else. If I get time to build a ZFS lab machine then I will certainly try these out and provide feedback on how they work. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap
On Tue, Mar 09, 2010 at 09:00:49AM +0800, Linghua Tseng wrote: > Here is the output of `diff -u /usr/src/etc/nsswitch.conf > /etc/nsswitch.conf'. > --- /usr/src/etc/nsswitch.conf 2010-03-08 09:04:25.0 +0800 > +++ /etc/nsswitch.conf 2010-03-08 18:01:08.0 +0800 > @@ -1,13 +1,13 @@ > # > # nsswitch.conf(5) - name service switch configuration file > -# $FreeBSD: src/etc/nsswitch.conf,v 1.1.10.1 2009/08/03 08:13:06 kensmith > Exp $ > +# $FreeBSD: src/etc/nsswitch.conf,v 1.1 2006/05/03 15:14:47 ume Exp $ > # > group: compat > -group_compat: nis > +group_compat: ldap nis > hosts: files dns > networks: files > passwd: compat > -passwd_compat: nis > +passwd_compat: ldap nis > shells: files > services: compat > services_compat: nis > > The line `+:*' has already put into /etc/master.passwd, > and the line `+:*::' has already put into /etc/group. I may be completely wrong (I can't seem to find the source), and I don't know if it is the source of your problem, but I recall it being reported that 'passwd_compat' and 'group_compat' require a *single* source entry. -- greg byshenk - gbysh...@byshenk.net - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Cron output mail lost with update to RELENG_7
> Date: Fri, 5 Mar 2010 12:33:09 -0800 > From: Jeremy Chadwick > Sender: owner-freebsd-sta...@freebsd.org > > On Fri, Mar 05, 2010 at 11:32:47AM -0800, Kevin Oberman wrote: > > I have discovered a problem with the mail sent by cron jobs (I refer > > only to logs, not invocations of mail from scripts.) They never are > > delivered. > > Mar 5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient > > addresses found in message header > > > > This showed up when I upgraded the system to RELENG_7 yesterday. My > > previous install was RELENG_7 of May 2, 2009 and it delivered the logs > > without any problems. No other changes were made. postfix was 2.6.5. > > > > I have this same issue on all 8.0 systems I have, but I was blaming a > > fault in postfix config. Now I realize that this is not the problem. > > > > I really don't know quite where to look for this. Any clues would be > > appreciated. > > I don't have this issue on any of our RELENG_7 or RELENG_8 systems, all > of which use postfix and WITHOUT_SENDMAIL in /etc/src.conf. > > It sounds like cron is trying to spawn something like mail(1) (more > likely /usr/sbin/sendmail; would have to look at the code) and passing > it either incorrect flags or actual content within the header itself, > e.g. a missing To: line. > > Since postfix is involved, have you verified your /etc/mail > configuration to make sure mailwrapper is referring to the correct > postfix binaries? > > The only other thing I can think of would be, possibly, some sort of > cronjob root has (either crontab -l or /etc/crontab) which makes use of > the MAILTO environment variable. See cron(8) for what I'm talking > about. > > You might have to run cron in debug mode (see -x flag; your argument > list will probably be quite long :-) ) to see what it's doing. > Otherwise truss or ktrace might be the only way to track down what's > going on underneath. After a lot of testing, I created a dummy sendmail that simply captured the arguments and the data from STDIN. #!/usr/local/bin/perl open OUT, ">/home/oberman/cronout.txt"; foreach (@ARGV) {print OUT "$_\n";} print OUT "Mailcat ran!\n"; sleep 5; while () { print OUT $_; } close OUT; It looks like cron is sending an empty message. I see MAILARG of '-FCronDaemon -odi -oem -oi -t' but that is followed by EOF with no content at all. I'm looking at the cron source, but I am baffled for the moment. I see no recent updates to cron in RELENG_7, though there are in RELENG_8. I'm running out of ideas. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: ober...@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Many processes stuck in zfs
> Date: Tue, 9 Mar 2010 21:53:55 +1100 > From: Peter Jeremy > Sender: owner-freebsd-sta...@freebsd.org > > On 2010-Mar-09 10:15:53 +0100, Stefan Bethke wrote: > >Over the past couple of months, I've more or less regularly observed > >machines having more and more processes stuck in the zfs wchan. The > >processes never recover from that, > > How long have you waited? > > There seems to be a problem with low free memory handling that causes ZFS > to turn into cold molasses. The work-around is to run a program that > allocates a decent size chunk of memory and then exits. The original > suggestion was something like: > perl -e '@x = (0) x 100;' > I've written a short program that allocates and dirties ~100MB and then > exits and run it from cron. Sigh! I found it. I build my systems without NIS and I had the stock nsswitch.conf file. Fixed. /me banging my head against the desk. Thanks! -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: ober...@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Many processes stuck in zfs
Sigh. My brain is fried. I replied to the wrong thread. Pleas ignore this. -- R. Kevin Oberman, Network Engineer Energy Sciences Network (ESnet) Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) E-mail: ober...@es.net Phone: +1 510 486-8634 Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 > Date: Tue, 09 Mar 2010 16:18:25 -0800 > From: "Kevin Oberman" > Sender: owner-freebsd-sta...@freebsd.org > > > Date: Tue, 9 Mar 2010 21:53:55 +1100 > > From: Peter Jeremy > > Sender: owner-freebsd-sta...@freebsd.org > > > > On 2010-Mar-09 10:15:53 +0100, Stefan Bethke wrote: > > >Over the past couple of months, I've more or less regularly observed > > >machines having more and more processes stuck in the zfs wchan. The > > >processes never recover from that, > > > > How long have you waited? > > > > There seems to be a problem with low free memory handling that causes ZFS > > to turn into cold molasses. The work-around is to run a program that > > allocates a decent size chunk of memory and then exits. The original > > suggestion was something like: > > perl -e '@x = (0) x 100;' > > I've written a short program that allocates and dirties ~100MB and then > > exits and run it from cron. > > Sigh! I found it. I build my systems without NIS and I had the stock > nsswitch.conf file. Fixed. > > /me banging my head against the desk. > > Thanks! > -- > R. Kevin Oberman, Network Engineer > Energy Sciences Network (ESnet) > Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab) > E-mail: ober...@es.netPhone: +1 510 486-8634 > Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751 > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Cron output mail lost with update to RELENG_7
> Date: Fri, 5 Mar 2010 12:33:09 -0800 > From: Jeremy Chadwick > Sender: owner-freebsd-sta...@freebsd.org > > On Fri, Mar 05, 2010 at 11:32:47AM -0800, Kevin Oberman wrote: > > I have discovered a problem with the mail sent by cron jobs (I refer > > only to logs, not invocations of mail from scripts.) They never are > > delivered. > > Mar 5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient > > addresses found in message header > > > > This showed up when I upgraded the system to RELENG_7 yesterday. My > > previous install was RELENG_7 of May 2, 2009 and it delivered the logs > > without any problems. No other changes were made. postfix was 2.6.5. > > > > I have this same issue on all 8.0 systems I have, but I was blaming a > > fault in postfix config. Now I realize that this is not the problem. > > > > I really don't know quite where to look for this. Any clues would be > > appreciated. > > I don't have this issue on any of our RELENG_7 or RELENG_8 systems, all > of which use postfix and WITHOUT_SENDMAIL in /etc/src.conf. > > It sounds like cron is trying to spawn something like mail(1) (more > likely /usr/sbin/sendmail; would have to look at the code) and passing > it either incorrect flags or actual content within the header itself, > e.g. a missing To: line. > > Since postfix is involved, have you verified your /etc/mail > configuration to make sure mailwrapper is referring to the correct > postfix binaries? > > The only other thing I can think of would be, possibly, some sort of > cronjob root has (either crontab -l or /etc/crontab) which makes use of > the MAILTO environment variable. See cron(8) for what I'm talking > about. > > You might have to run cron in debug mode (see -x flag; your argument > list will probably be quite long :-) ) to see what it's doing. > Otherwise truss or ktrace might be the only way to track down what's > going on underneath. > > -- > | Jeremy Chadwick j...@parodius.com | > | Parodius Networking http://www.parodius.com/ | > | UNIX Systems Administrator Mountain View, CA, USA | > | Making life hard for others since 1977. PGP: 4BD6C0CB | > > ___ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > > Date: Fri, 5 Mar 2010 12:33:09 -0800 > From: Jeremy Chadwick > Sender: owner-freebsd-sta...@freebsd.org > > On Fri, Mar 05, 2010 at 11:32:47AM -0800, Kevin Oberman wrote: > > I have discovered a problem with the mail sent by cron jobs (I refer > > only to logs, not invocations of mail from scripts.) They never are > > delivered. > > Mar 5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient > > addresses found in message header > > Mar 5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient > > addresses found in message header > > > > This showed up when I upgraded the system to RELENG_7 yesterday. My > > previous install was RELENG_7 of May 2, 2009 and it delivered the logs > > without any problems. No other changes were made. postfix was 2.6.5. > > > > I have this same issue on all 8.0 systems I have, but I was blaming a > > fault in postfix config. Now I realize that this is not the problem. > > > > I really don't know quite where to look for this. Any clues would be > > appreciated. > > I don't have this issue on any of our RELENG_7 or RELENG_8 systems, all > of which use postfix and WITHOUT_SENDMAIL in /etc/src.conf. > > It sounds like cron is trying to spawn something like mail(1) (more > likely /usr/sbin/sendmail; would have to look at the code) and passing > it either incorrect flags or actual content within the header itself, > e.g. a missing To: line. > > Since postfix is involved, have you verified your /etc/mail > configuration to make sure mailwrapper is referring to the correct > postfix binaries? > > The only other thing I can think of would be, possibly, some sort of > cronjob root has (either crontab -l or /etc/crontab) which makes use of > the MAILTO environment variable. See cron(8) for what I'm talking > about. > > You might have to run cron in debug mode (see -x flag; your argument > list will probably be quite long :-) ) to see what it's doing. > Otherwise
ugen kernel module?
If FreeBSD7 there was ugen.ko kernel module and I can use apcupsd with USB devices, but in FreeBSD there is no such module, how can I use APC power supply with usb interface (I mean usage of the apcupsd port)? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: ugen kernel module?
In the last episode (Mar 10): > In FreeBSD7 there was ugen.ko kernel module and I can use apcupsd with USB > devices, but in FreeBSD there is no such module, how can I use APC power > supply with usb interface (I mean usage of the apcupsd port)? It's built into the usb subsystem now. All USB devices (including USB hubs and devices controlled by other drivers) now have a ugen device. Try running "usbconfig list" to show them. I bet your UPS has just moved to a different ugen number. -- Dan Nelson dnel...@allantgroup.com ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap
Thanks. I have tried to modify my /etc/nsswitch.conf to: group: compat group_compat: ldap hosts: files dns networks: files passwd: compat passwd_compat: ldap shells: files services: compat services_compat: nis protocols: files rpc: files But the problem is still occurred. -- From: "Greg Byshenk" Sent: Wednesday, March 10, 2010 3:11 AM To: "Linghua Tseng" Cc: "Peter C. Lai" ; Subject: Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap On Tue, Mar 09, 2010 at 09:00:49AM +0800, Linghua Tseng wrote: Here is the output of `diff -u /usr/src/etc/nsswitch.conf /etc/nsswitch.conf'. --- /usr/src/etc/nsswitch.conf 2010-03-08 09:04:25.0 +0800 +++ /etc/nsswitch.conf 2010-03-08 18:01:08.0 +0800 @@ -1,13 +1,13 @@ # # nsswitch.conf(5) - name service switch configuration file -# $FreeBSD: src/etc/nsswitch.conf,v 1.1.10.1 2009/08/03 08:13:06 kensmith Exp $ +# $FreeBSD: src/etc/nsswitch.conf,v 1.1 2006/05/03 15:14:47 ume Exp $ # group: compat -group_compat: nis +group_compat: ldap nis hosts: files dns networks: files passwd: compat -passwd_compat: nis +passwd_compat: ldap nis shells: files services: compat services_compat: nis The line `+:*' has already put into /etc/master.passwd, and the line `+:*::' has already put into /etc/group. I may be completely wrong (I can't seem to find the source), and I don't know if it is the source of your problem, but I recall it being reported that 'passwd_compat' and 'group_compat' require a *single* source entry. -- greg byshenk - gbysh...@byshenk.net - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"