date:20100309

ntpd does not re-query servers, when a new interface appears

2010-03-09 Thread Dominic Fandrey

ntpd tracks interface updates, however it does not requery
servers, when they occur. This was less than an hour ago,
at my university, the notebook boots and is not connected
to anything:

 9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd
 9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec
 9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 0.0.0.0#123 
Disabled
 9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 Disabled
 9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 192.168.1.12#123 
Enabled
 9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 Enabled
 9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled
 9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 Enabled
 9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for interface 
updates
 9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040
 9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from 
/var/db/ntpd.drift
 9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org
 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving up on 
it
 9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org
 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving up on 
it
 9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org
 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving up on 
it
 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', giving 
up on it
 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', giving 
up on it
 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de
 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', giving 
up on it
 9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de
 9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', giving 
up on it

So ntpd has given up on all the servers listed in the ntp.conf file.

I then proceed to connect to the wireless network and proceed to log
into two VPNs:

 9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, 192.168.75.58#123 
Enabled
 9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, 193.196.120.15#123 
Enabled
 9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 141.3.162.67#123 
Enabled

Over interface #8 some of the servers are actually available, but
ntpq -p still states:
No association ID's returned

Only when I restart ntpd, it operates as expected:
 remote   refid  st t when poll reach   delay   offset  jitter
==
 zit-net2.uni-pa .STEP.  16 u-  51200.0000.000   0.000
 alpha.rueckgr.a .STEP.  16 u-  51200.0000.000   0.000
 ntp.goneco.de   .STEP.  16 u-  51200.0000.000   0.000
+proxy4.rz.uni-k 129.13.64.17 2 u   30  128  2712.9372.530   1.891
+proxy2.rz.uni-k 129.13.64.17 2 u   58  128  3753.593   -8.981   1.837
*proxy1.rz.uni-k 129.13.64.17 2 u   15  128  2713.2978.244   1.487

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: is dtrace usable?

2010-03-09 Thread Alexander Leidinger

Quoting John Baldwin  (from Mon, 8 Mar 2010 10:00:12 -0500):

On Saturday 06 March 2010 11:00:12 am Robert Watson wrote:

On Sat, 6 Mar 2010, Alexander Leidinger wrote:

>> Take a look at the DTrace configuration information here:
>>
>>http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dtrace.html
>
> I've just reread it (despite the fact that I already used it). Some
> comments:
>
> Last time I tried, I didn't see any problems by adding
>  makeoptions WITH_CTF=yes
> to the kernel config instead of doing
>  make WITH_CTF=1 kernel
>
> Did I miss something, and if not, shouldn't we tell about the
> makeoptions part instead (a kernel rebuild later will not cause
> trouble when someone forgets to do the WITH_CTF part as it is already
> in the kernel makefile)?

I'll leave John to answer this one, CC line broadended.

I would be very surprised if 'makeoptions WITH_CTF=yes' worked.  The many
times I and others have tried it it did not work.  Do you have a log of your
build showing the ctfconvert and ctfmerge command lines?

I do not have a log around, it has been a while since I did something  
with dtrace (a year ago) and I can not remember that I always added  
WITH_CTF on a build (but it was about SDT probes, not FBT probes, in  
case it matters).

I had a look again, WITH_CTF=yes is one of the first lines in the  
Makefile, and /usr/share/mk/sys.mk has "if !defined(WITH_CTF)". "make  
-V WITH_CTF" shows "yes", but "make -V NO_CTF" shows "1". This is  
strange, isn't it? I would expect that NO_CTF is undefined. Is this a  
bug in make, or a bug in the man page (neither in the description of  
the different kinds of variables, nor in the description of "defined"  
is something mentioned explaining this behavior).

The current kernel on my test machine is compiled with "makeoptions  
...", and a "dtrace -l" causes the watchdog to trigger a panic. Ugh...  
that's not a nice behavior. :(

Bye,
Alexander.

--
Adding sound to movies would be like
putting lipstick on the Venus de Milo.
-- actress Mary Pickford, 1925

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Survey results very helpful, thanks! (was: Re: net.inet.tcp.timer_race: does anyone have a non-zero value?)

2010-03-09 Thread Doug Hardie


On 8 March 2010, at 12:33, Robert Watson wrote:

> 
> On Mon, 8 Mar 2010, Doug Hardie wrote:
> 
>> I run a number of 4 core systems with em interfaces.  These are production 
>> systems that are unmanned and located a long way from me.  Under unusual 
>> conditions it can take up to 6 hours to get there.  I have been waiting to 
>> switch to 8.0 because of the discussions on the em device and now it sounds 
>> like I had better just skip 8.x and wait for 9.  7.2 is working just fine.
> 
> Not sure that any information in this survey thread should be relevant to 
> that decision.  This race has existed since before FreeBSD, having appeared 
> in the original BSD network stack, and is just as present in FreeBSD 7.x as 
> 8.x or 9.x.  When I learned about the race during the early 7.x development 
> cycle, I added a counter/statistic to measure how much it happened in 
> practice, but was not able to exercise it in my testing, and so left the 
> counter in to appear in 7.0 and later so that we could perform this survey as 
> core counts/etc increase.
> 
> The two likely outcomes were "it is never exercised" and "it is exercised but 
> only very infrequently", neither really justifying the quite complex change 
> to correct it given requirements at the time.  On-going development work on 
> the virtual network stack is what justifies correcting the bug at this point, 
> moving from detecting and handling the race to preventing it from occuring as 
> an invariant.  The motivation here, BTW, is that we'd like to eliminate the 
> type-stable storage requirement for connection state (which ensures that 
> memory once used for a connection block is only ever used for connection 
> blocks in the future), allowing memory to be fully freed when a virtual 
> network stack is destroyed.  Using type-stable storage helped address this 
> bug, but was primarily present to reduce the overhead of monitoring using 
> netstat(1).  We'll now need to use a slightly more expensive solution (true 
> reference counts) in that context, although in practice it will almost 
> certainly be an unmeasurable cost.
> 
> Which is to say that while there might be something in the em/altq/... thread 
> to reasonably lead you to avoid 8.0, nothing in the TCP timer race thread 
> should do so, since it affects 7.2 just as much as 8.0.  Even if you do see a 
> non-zero counter, that's not a matter for operational concern, just useful 
> from the perspective of a network stack developer to understanding timing and 
> behaviors in the stack.  :-)


Thanks for the complete explanation.  I don't believe the ALTQ issue will 
affect me.  I am not currently using it and do not expect to in the near 
future.  In addition, there was a posting that a fix for at least part of that 
will be added in a week or so.  Given all that it appears its time to start the 
planning/testing process for 8.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Fwd: Re: NFS Client error

2010-03-09 Thread Giulio Ferro

Thanks for your kind reply, I'm forwarding it there...

 Original Message 
Subject:Re: NFS Client error
Date:   Mon, 08 Mar 2010 23:59:29 +0100
From:   vol...@vwsoft.com
To: Giulio Ferro 
CC: freebsd-hack...@freebsd.org, freebsd-...@freebsd.org

On 03/08/10 12:16, Giulio Ferro wrote:

 Freebsd 8 stable amd64

 It mounts different file systems by NFS (with locking) on a
 data server directly connected (gigabit) to the server

 Apache running in a several jails on those nfs folders.

 Now and then I get huge slow-down. When I look in the logs
 I get thousand of lines like these:
 Mar  5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 (httpd)
 Mar  5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on
 signal 11

 What should I do?

Giulio,

it seems this is anyhow not related to network (nfs) operations. It's
looking like a problem in the VM. I think it makes sense to have a look
at the httpd.core file if the binary has been linked with debugging
symbols turned on. Also I think at first, it may not hurt to look at
vmstat -m output.

You may want to change ${subject} and post to stable@ to drive more
attention to your problem.

Volker

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Fwd: Re: NFS Client error

2010-03-09 Thread Daniel Braniss

> Thanks for your kind reply, I'm forwarding it there...
> 
> 
>  Original Message 
> Subject:  Re: NFS Client error
> Date: Mon, 08 Mar 2010 23:59:29 +0100
> From: vol...@vwsoft.com
> To:   Giulio Ferro 
> CC:   freebsd-hack...@freebsd.org, freebsd-...@freebsd.org
> 
> 
> 
> On 03/08/10 12:16, Giulio Ferro wrote:
> >  Freebsd 8 stable amd64
> >
> >  It mounts different file systems by NFS (with locking) on a
> >  data server directly connected (gigabit) to the server
> >
> >  Apache running in a several jails on those nfs folders.
> >
> >  Now and then I get huge slow-down. When I look in the logs
> >  I get thousand of lines like these:
> >  Mar  5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 (httpd)
> >  Mar  5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on
> >  signal 11
> >
> >
> >  What should I do?

If the binary (httpd) is on a nfs server, then if the binary got
modified this is what usualy happens

my 2c
danny

> 
> Giulio,
> 
> it seems this is anyhow not related to network (nfs) operations. It's
> looking like a problem in the VM. I think it makes sense to have a look
> at the httpd.core file if the binary has been linked with debugging
> symbols turned on. Also I think at first, it may not hurt to look at
> vmstat -m output.
> 
> You may want to change ${subject} and post to stable@ to drive more
> attention to your problem.
> 
> Volker
> 
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Many processes stuck in zfs

2010-03-09 Thread Stefan Bethke

Over the past couple of months, I've more or less regularly observed machines 
having more and more processes stuck in the zfs wchan.  The processes never 
recover from that, and trying to reboot only gets the entire system stuck, 
without any console messages.  I can enter the debugger, and I have saved a 
couple of dumps.

The situation seems to be triggered by zfs receive'ing snapshots from the 
sister machine (both synchronize their active ZFS filesystems to each other, 
using zfs send and zfs receive).  It appears it's the receiving causing trouble.

Both machines run 8-stable from mid-February, with a single-disk ZFS pool, with 
ARC limited to 512M, prefetch and ZIL disabled via loader.conf.

What should I be looking at to further diagnose?


Thanks,
Stefan

-- 
Stefan BethkeFon +49 151 14070811



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS hot spares

2010-03-09 Thread Pawel Jakub Dawidek

On Mon, Mar 08, 2010 at 01:06:10PM -0500, Steve Polyack wrote:
> ZFS in FreeBSD lacks at least one major feature from the Solaris 
> version: hot spares.   There is a PR open at 
> http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been 
> any motion/thoughts posted on it since its creation almost one year ago.
> 
> I'm aware that on Solaris, hot spare replacement is handled by a few 
> Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug 
> into the Solaris FMA (Fault Management Architecture).  Have there been 
> any thoughts on porting these over or getting something similar running 
> within FreeBSD?  With all of the recent SATA/SAS CAM hotplug work now 
> committed, it would be nice to have automatic replacement of hot spares 
> with a future hot-replacement of the failed drive.
> 
> On the other side, I'd be interested in hearing if anyone has had 
> success in rolling their own scripted solution: i.e. something which 
> polls 'zpool status' looking for failed drives and performing hot-spare 
> replacements automatically.

Currently FreeBSD's ZFS sends various events to devd. It should be
possible to implement some scripts (or maybe reuse
zfs-retire/zfs-diagnose?) to perform 'zpool replace' when disk
disappears, etc. This shouldn't be very hard modulo bugs in FreeBSD/ZFS
as this functionality, because unused, wasn't tested.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpMyPwqRnBn5.pgp
Description: PGP signature

Re: Fwd: Re: NFS Client error

2010-03-09 Thread Giulio Ferro

On 09.03.2010 10:14, Daniel Braniss wrote:

Thanks for your kind reply, I'm forwarding it there...

 Original Message 
Subject:Re: NFS Client error
Date:   Mon, 08 Mar 2010 23:59:29 +0100
From:   vol...@vwsoft.com
To: Giulio Ferro
CC: freebsd-hack...@freebsd.org, freebsd-...@freebsd.org

On 03/08/10 12:16, Giulio Ferro wrote:

  Freebsd 8 stable amd64

  It mounts different file systems by NFS (with locking) on a
  data server directly connected (gigabit) to the server

  Apache running in a several jails on those nfs folders.

  Now and then I get huge slow-down. When I look in the logs
  I get thousand of lines like these:
  Mar  5 11:50:52 virt2 kernel: vm_fault: pager read error, pid 46487 (httpd)
  Mar  5 11:50:52 virt2 kernel: pid 46487 (httpd), uid 80: exited on
  signal 11

  What should I do?

If the binary (httpd) is on a nfs server, then if the binary got
modified this is what usualy happens

Nope. The binary is on the jails on the local machine.
Only the configuration dir (etc/apache22) and data dir (www)
in on the nfs server.

|NFS CLIENT  |
| jail 1 : httpd |
| jail 2 : httpd | -->NFS SERVER
| jail 3 : httpd |
|...   |
---

Giulio.

my 2c
danny

Giulio,

it seems this is anyhow not related to network (nfs) operations. It's
looking like a problem in the VM. I think it makes sense to have a look
at the httpd.core file if the binary has been linked with debugging
symbols turned on. Also I think at first, it may not hurt to look at
vmstat -m output.

You may want to change ${subject} and post to stable@ to drive more
attention to your problem.

Volker

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd does not re-query servers, when a new interface appears

2010-03-09 Thread Ian Smith

On Tue, 9 Mar 2010, Dominic Fandrey wrote:
 > ntpd tracks interface updates, however it does not requery
 > servers, when they occur. This was less than an hour ago,
 > at my university, the notebook boots and is not connected
 > to anything:
 > 
 >  9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd
 >  9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec
 >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 0.0.0.0#123 
 > Disabled
 >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 
 > Disabled
 >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 192.168.1.12#123 
 > Enabled
 >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 
 > Enabled
 >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled
 >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 
 > Enabled
 >  9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for 
 > interface updates
 >  9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040
 >  9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from 
 > /var/db/ntpd.drift
 >  9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org
 >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving up 
 > on it
 >  9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org
 >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving up 
 > on it
 >  9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org
 >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving up 
 > on it
 >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
 >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
 > giving up on it
 >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
 >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
 > giving up on it
 >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de
 >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', 
 > giving up on it
 >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de
 >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', 
 > giving up on it
 > 
 > So ntpd has given up on all the servers listed in the ntp.conf file.

Yes, but it looks more like name service that's not operating, ntpd 
seems to be doing its best but can't resolve the hostnames?

 > I then proceed to connect to the wireless network and proceed to log
 > into two VPNs:
 > 
 >  9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, 
 > 192.168.75.58#123 Enabled
 >  9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, 
 > 193.196.120.15#123 Enabled
 >  9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 141.3.162.67#123 
 > Enabled
 > 
 > Over interface #8 some of the servers are actually available, but
 > ntpq -p still states:
 > No association ID's returned
 > 
 > Only when I restart ntpd, it operates as expected:
 >  remote   refid  st t when poll reach   delay   offset  
 > jitter
 > ==
 >  zit-net2.uni-pa .STEP.  16 u-  51200.0000.000   
 > 0.000
 >  alpha.rueckgr.a .STEP.  16 u-  51200.0000.000   
 > 0.000
 >  ntp.goneco.de   .STEP.  16 u-  51200.0000.000   
 > 0.000
 > +proxy4.rz.uni-k 129.13.64.17 2 u   30  128  2712.9372.530   
 > 1.891
 > +proxy2.rz.uni-k 129.13.64.17 2 u   58  128  3753.593   -8.981   
 > 1.837
 > *proxy1.rz.uni-k 129.13.64.17 2 u   15  128  2713.2978.244   
 > 1.487

I've always had to restart named after losing / regaining an interface, 
most noticeably after a suspend/resume (eg a low battery suspend), so I 
run /etc/rc.d/named restart from rc.resume.  This looks like a similar 
issue perhaps, though I don't see why restarting only ntpd would fix it.

HTH, Ian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Many processes stuck in zfs

2010-03-09 Thread Frédéric Bour

Le Tue, 9 Mar 2010 10:15:53 +0100,
Stefan Bethke  a écrit :

> Over the past couple of months, I've more or less regularly observed
> machines having more and more processes stuck in the zfs wchan.  The
> processes never recover from that, and trying to reboot only gets the
> entire system stuck, without any console messages.  I can enter the
> debugger, and I have saved a couple of dumps.
> 
> The situation seems to be triggered by zfs receive'ing snapshots from
> the sister machine (both synchronize their active ZFS filesystems to
> each other, using zfs send and zfs receive).  It appears it's the
> receiving causing trouble.
> 
> Both machines run 8-stable from mid-February, with a single-disk ZFS
> pool, with ARC limited to 512M, prefetch and ZIL disabled via
> loader.conf.
> 
> What should I be looking at to further diagnose?
> 
> 
> Thanks,
> Stefan
> 

Hi,

I encounter almost the same problem with a 8-STABLE build
from the same time. When working a lot on files inside ~/,
the directory get locked, any command trying to access it (from "ls" to
application reading their configuration files) get stuck.

The system is an amd64 desktop computer, 4GiB of memory and
vfs.zfs.prefetch_disable is set to 0.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd does not re-query servers, when a new interface appears

2010-03-09 Thread Jeremy Chadwick

On Tue, Mar 09, 2010 at 09:27:35PM +1100, Ian Smith wrote:
> On Tue, 9 Mar 2010, Dominic Fandrey wrote:
>  > ntpd tracks interface updates, however it does not requery
>  > servers, when they occur. This was less than an hour ago,
>  > at my university, the notebook boots and is not connected
>  > to anything:
>  > 
>  >  9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd
>  >  9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 
> 0.0.0.0#123 Disabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 
> Disabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 
> 192.168.1.12#123 Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 
> Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 
> Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for 
> interface updates
>  >  9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040
>  >  9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from 
> /var/db/ntpd.drift
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving 
> up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving 
> up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving 
> up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
> giving up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
> giving up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', 
> giving up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', 
> giving up on it
>  > 
>  > So ntpd has given up on all the servers listed in the ntp.conf file.
> 
> Yes, but it looks more like name service that's not operating, ntpd 
> seems to be doing its best but can't resolve the hostnames?
> 
>  > I then proceed to connect to the wireless network and proceed to log
>  > into two VPNs:
>  > 
>  >  9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, 
> 192.168.75.58#123 Enabled
>  >  9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, 
> 193.196.120.15#123 Enabled
>  >  9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 
> 141.3.162.67#123 Enabled
>  > 
>  > Over interface #8 some of the servers are actually available, but
>  > ntpq -p still states:
>  > No association ID's returned
>  > 
>  > Only when I restart ntpd, it operates as expected:
>  >  remote   refid  st t when poll reach   delay   offset  
> jitter
>  > 
> ==
>  >  zit-net2.uni-pa .STEP.  16 u-  51200.0000.000   
> 0.000
>  >  alpha.rueckgr.a .STEP.  16 u-  51200.0000.000   
> 0.000
>  >  ntp.goneco.de   .STEP.  16 u-  51200.0000.000   
> 0.000
>  > +proxy4.rz.uni-k 129.13.64.17 2 u   30  128  2712.9372.530   
> 1.891
>  > +proxy2.rz.uni-k 129.13.64.17 2 u   58  128  3753.593   -8.981   
> 1.837
>  > *proxy1.rz.uni-k 129.13.64.17 2 u   15  128  2713.2978.244   
> 1.487
> 
> I've always had to restart named after losing / regaining an interface, 
> most noticeably after a suspend/resume (eg a low battery suspend), so I 
> run /etc/rc.d/named restart from rc.resume.  This looks like a similar 
> issue perhaps, though I don't see why restarting only ntpd would fix it.

named is supposed to auto-probe for interfaces at a specific interval;
see the "interface-interval" option.  I forget what the default is,
but on our servers we explicitly disable it by setting it to 0.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Many processes stuck in zfs

2010-03-09 Thread Peter Jeremy

On 2010-Mar-09 10:15:53 +0100, Stefan Bethke  wrote:
>Over the past couple of months, I've more or less regularly observed machines 
>having more and more processes stuck in the zfs wchan.  The processes never 
>recover from that,

How long have you waited?

There seems to be a problem with low free memory handling that causes ZFS
to turn into cold molasses.  The work-around is to run a program that
allocates a decent size chunk of memory and then exits.  The original
suggestion was something like:
perl -e '@x = (0) x 100;'
I've written a short program that allocates and dirties ~100MB and then
exits and run it from cron.

-- 
Peter Jeremy

pgpj4AtT47MPd.pgp
Description: PGP signature

freebsd 7.2stable em0: discard frame w/o packet header

2010-03-09 Thread Sergey Radashevskiy

The system panic with error  kernel: em0: discard frame w/o packet header
Crach dump one
kgdb /boot/kernel/kernel /var/crash/vmcore.5
Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 00
fault virtual address   = 0xd9c6f105
fault code  = supervisor read, page not present
instruction pointer = 0x20:0x808a021f
stack pointer   = 0x28:0x85909c10
frame pointer   = 0x28:0x85909c44
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 14 (swi4: clock sio)
trap number = 12
panic: page fault
cpuid = 0
Uptime: 11h51m2s
Physical memory: 1936 MB
Dumping 331 MB: 316 300 284 268 252 236 220 204 188 172 156 140 124 108 92 76 
60 44 28 12

(kgdb) where
#0  doadump () at pcpu.h:196
#1  0x8069a618 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0x8069a8f5 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0x8095f294 in trap_fatal (frame=0x85909bd0, eva=3653693701) at 
/usr/src/sys/i386/i386/trap.c:950
#4  0x8095f4fd in trap_pfault (frame=0x85909bd0, usermode=0, eva=3653693701) at 
/usr/src/sys/i386/i386/trap.c:863
#5  0x8095febd in trap (frame=0x85909bd0) at /usr/src/sys/i386/i386/trap.c:541
#6  0x8094479b in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0x808a021f in uma_zfree_arg (zone=0x86a3b000, item=0x8cdcd580, 
udata=0x868fa000) at /usr/src/sys/vm/uma_core.c:2253
#8  0x8075492b in ng_netflow_expire (arg=0x868fa000) at 
/usr/src/sys/netgraph/netflow/netflow.c:215
#9  0x806ac441 in softclock (dummy=0x0) at /usr/src/sys/kern/kern_timeout.c:274
#10 0x806789c9 in ithread_loop (arg=0x85c9d3d0) at 
/usr/src/sys/kern/kern_intr.c:1181
#11 0x80675271 in fork_exit (callout=0x8067882d , arg=0x85c9d3d0, 
frame=0x85909d38) at /usr/src/sys/kern/kern_fork.c:811
#12 0x80944810 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:271

Crach dump two

kgdb /boot/kernel.old/kernel /var/crash/vmcore.4

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address   = 0xc
fault code  = supervisor read, page not present
instruction pointer = 0x20:0x806ec864
stack pointer   = 0x28:0x85912a64
frame pointer   = 0x28:0x85912a90
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, def32 1, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 3 (ng_queue1)
trap number = 12
panic: page fault
cpuid = 1
Uptime: 12h4m33s
Physical memory: 1936 MB
Dumping 154 MB: 139 123 107 91 75 59 43 27 11

(kgdb) where
#0  doadump () at pcpu.h:196
#1  0x8069a618 in boot (howto=260) at /usr/src/sys/kern/kern_shutdown.c:418
#2  0x8069a8f5 in panic (fmt=Variable "fmt" is not available.
) at /usr/src/sys/kern/kern_shutdown.c:574
#3  0x8095f294 in trap_fatal (frame=0x85912a24, eva=12) at 
/usr/src/sys/i386/i386/trap.c:950
#4  0x8095f4fd in trap_pfault (frame=0x85912a24, usermode=0, eva=12) at 
/usr/src/sys/i386/i386/trap.c:863
#5  0x8095febd in trap (frame=0x85912a24) at /usr/src/sys/i386/i386/trap.c:541
#6  0x8094479b in calltrap () at /usr/src/sys/i386/i386/exception.s:166
#7  0x806ec864 in m_copym (m=0x0, off0=1496, len=1496, wait=1) at 
/usr/src/sys/kern/uipc_mbuf.c:539
#8  0x807912a3 in ip_fragment (ip=0x862146d6, m_frag=0x85912b5c, mtu=1500, 
if_hwassist_flags=38, sw_csum=1)
at /usr/src/sys/netinet/ip_output.c:731
#9  0x80791f4d in ip_output (m=0x86214600, opt=0x0, ro=0x85912b30, flags=32, 
imo=0x0, inp=0x88518bf4)
at /usr/src/sys/netinet/ip_output.c:570
#10 0x8079332f in rip_output (m=0x86341d00, so=0x87a86d00, dst=3540521226) at 
/usr/src/sys/netinet/raw_ip.c:408
#11 0x807933f4 in rip_send (so=0x87a86d00, flags=0, m=0x86341d00, nam=0x0, 
control=0x0, td=0x85cff480)
at /usr/src/sys/netinet/raw_ip.c:880
#12 0x806f590d in sosend_generic (so=0x87a86d00, addr=0x0, uio=0x0, 
top=0x86341d00, control=0x0, flags=0, td=0x85cff480)
at /usr/src/sys/kern/uipc_socket.c:1243
#13 0x806f1767 in sosend (so=0x87a86d00, addr=0x0, uio=0x0, top=0x86341d00, 
control=0x0, flags=0, td=0x85cff480)
at /usr/src/sys/kern/uipc_socket.c:1285
#14 0x807652e4 in ng_ksocket_rcvdata (hook=0x89164180, item=0x86925a50) at 
/usr/src/sys/netgraph/ng_ksocket.c:927
#15 0x8075986f in ng_apply_item (node=0x88f33900, item=0x86925a50, rw=0) at 
/usr/src/sys/netgraph/ng_base.c:2336
#16 0x8075acf5 in ngthread (arg=0x0) at /usr/src/sys/netgraph/ng_base.c:3304
#17 0x80675271 in fork_exit (callout=0x8075aaa9 , arg=0x0, 
frame=0x85912d38) at /usr/src/sys/kern/kern_fork.c:811
#18 0x80944810 in fork_trampoline () at /usr/src/sys/i386/i386/exception.s:271

-- реклама ---
Акция! При покупке хостинга больше места и домен в подарок.
http://FREEhost.UA

__

Re: Many processes stuck in zfs

2010-03-09 Thread Pawel Jakub Dawidek

On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote:
> Over the past couple of months, I've more or less regularly observed machines 
> having more and more processes stuck in the zfs wchan.  The processes never 
> recover from that, and trying to reboot only gets the entire system stuck, 
> without any console messages.  I can enter the debugger, and I have saved a 
> couple of dumps.
> 
> The situation seems to be triggered by zfs receive'ing snapshots from the 
> sister machine (both synchronize their active ZFS filesystems to each other, 
> using zfs send and zfs receive).  It appears it's the receiving causing 
> trouble.
> 
> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, 
> with ARC limited to 512M, prefetch and ZIL disabled via loader.conf.
> 
> What should I be looking at to further diagnose?

What kind of hardware do you have there? There is 3-way deadlock I've a
fix for which would be hard to trigger on single or dual core machines.

Feel free to try the fix:

http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpS3cNrs3nSh.pgp
Description: PGP signature

Re: ntpd does not re-query servers, when a new interface appears

2010-03-09 Thread Jeremy Chadwick

On Tue, Mar 09, 2010 at 02:17:48PM +0100, Dominic Fandrey wrote:
> On 09/03/2010 11:27, Ian Smith wrote:
> > On Tue, 9 Mar 2010, Dominic Fandrey wrote:
> >  > ntpd tracks interface updates, however it does not requery
> >  > servers, when they occur. This was less than an hour ago,
> >  > at my university, the notebook boots and is not connected
> >  > to anything:
> >  > 
> >  >  9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd
> >  >  9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec
> >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 
> > 0.0.0.0#123 Disabled
> >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 
> > Disabled
> >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 
> > 192.168.1.12#123 Enabled
> >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 
> > Enabled
> >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 
> > Enabled
> >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 
> > Enabled
> >  >  9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for 
> > interface updates
> >  >  9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040
> >  >  9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from 
> > /var/db/ntpd.drift
> >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org
> >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving 
> > up on it
> >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org
> >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving 
> > up on it
> >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org
> >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving 
> > up on it
> >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
> >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
> > giving up on it
> >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
> >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
> > giving up on it
> >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de
> >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', 
> > giving up on it
> >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de
> >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', 
> > giving up on it
> >  > 
> >  > So ntpd has given up on all the servers listed in the ntp.conf file.
> > 
> > Yes, but it looks more like name service that's not operating, ntpd 
> > seems to be doing its best but can't resolve the hostnames?
> 
> Why would I have named running on a notebook? This is a notebook,
> which is not connected to the internet.
> 
> >  > I then proceed to connect to the wireless network and proceed to log
> >  > into two VPNs:
> >  > 
> >  >  9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, 
> > 192.168.75.58#123 Enabled
> >  >  9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, 
> > 193.196.120.15#123 Enabled
> >  >  9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 
> > 141.3.162.67#123 Enabled
> >  > 
> >  > Over interface #8 some of the servers are actually available, but
> >  > ntpq -p still states:
> >  > No association ID's returned
> >  > 
> >  > Only when I restart ntpd, it operates as expected:
> >  >  remote   refid  st t when poll reach   delay   offset  
> > jitter
> >  > 
> > ==
> >  >  zit-net2.uni-pa .STEP.  16 u-  51200.0000.000   
> > 0.000
> >  >  alpha.rueckgr.a .STEP.  16 u-  51200.0000.000   
> > 0.000
> >  >  ntp.goneco.de   .STEP.  16 u-  51200.0000.000   
> > 0.000
> >  > +proxy4.rz.uni-k 129.13.64.17 2 u   30  128  2712.9372.530   
> > 1.891
> >  > +proxy2.rz.uni-k 129.13.64.17 2 u   58  128  3753.593   -8.981   
> > 1.837
> >  > *proxy1.rz.uni-k 129.13.64.17 2 u   15  128  2713.2978.244   
> > 1.487
> > 
> > I've always had to restart named after losing / regaining an interface, 
> > most noticeably after a suspend/resume (eg a low battery suspend), so I 
> > run /etc/rc.d/named restart from rc.resume.  This looks like a similar 
> > issue perhaps, though I don't see why restarting only ntpd would fix it.
> 
> As I said, named doesn't run at all. When the notebook gets an
> internet connection, ntpd recognizes this. It somehow doesn't
> occur to it, though, that it might be able to resolve the
> servers, now.

I believe this is the problem.  Note that you'll need to add an SSL
cert. exception for this site due to them using self-signed certs.

https://support.ntp.org/bugs/show_bug.cgi?id=987

-- 
| Jeremy Chadwick   j...@

Re: Many processes stuck in zfs

2010-03-09 Thread Stefan Bethke

Am 09.03.2010 um 11:53 schrieb Peter Jeremy:

> On 2010-Mar-09 10:15:53 +0100, Stefan Bethke  wrote:
>> Over the past couple of months, I've more or less regularly observed 
>> machines having more and more processes stuck in the zfs wchan.  The 
>> processes never recover from that,
> 
> How long have you waited?

Many hours, sometimes up to 48 hours (when I didn't notice the stuck processes 
at first).

> There seems to be a problem with low free memory handling that causes ZFS
> to turn into cold molasses.  The work-around is to run a program that
> allocates a decent size chunk of memory and then exits.  The original
> suggestion was something like:
>   perl -e '@x = (0) x 100;'
> I've written a short program that allocates and dirties ~100MB and then
> exits and run it from cron.

I'll try that the next time I encounter the stuck processes.

I'm recording ZFS ARC stats with munin, would I be able to identify such a low 
memory situation from there?  Would it make sense to monitor other stats?


Thanks,
Stefan

-- 
Stefan BethkeFon +49 151 14070811



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Many processes stuck in zfs

2010-03-09 Thread Borja Marcos


On Mar 9, 2010, at 1:58 PM, Pawel Jakub Dawidek wrote:

>>> What kind of hardware do you have there? There is 3-way deadlock I've a
>>> fix for which would be hard to trigger on single or dual core machines.
>>> 
>>> Feel free to try the fix:
>>> 
>>> http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch
>> 
>> Maybe related to the deadlock I reported when I was receiving an incremental 
>> snapshot while the target dataset was being read?
> 
> Could be. This deadlock is in general related to zfs recv functionality.

Aye aye, Sir

set fingers -position crossed

testing :)





Borja

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: is dtrace usable?

2010-03-09 Thread John Baldwin

On Tuesday 09 March 2010 3:27:09 am Alexander Leidinger wrote:
> Quoting John Baldwin  (from Mon, 8 Mar 2010 10:00:12 -0500):
> 
> > On Saturday 06 March 2010 11:00:12 am Robert Watson wrote:
> >> On Sat, 6 Mar 2010, Alexander Leidinger wrote:
> >>
> >> >> Take a look at the DTrace configuration information here:
> >> >>
> >> >>http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dtrace.html
> >> >
> >> > I've just reread it (despite the fact that I already used it). Some
> >> > comments:
> >> >
> >> > Last time I tried, I didn't see any problems by adding
> >> >  makeoptions WITH_CTF=yes
> >> > to the kernel config instead of doing
> >> >  make WITH_CTF=1 kernel
> >> >
> >> > Did I miss something, and if not, shouldn't we tell about the
> >> > makeoptions part instead (a kernel rebuild later will not cause
> >> > trouble when someone forgets to do the WITH_CTF part as it is already
> >> > in the kernel makefile)?
> >>
> >> I'll leave John to answer this one, CC line broadended.
> >
> > I would be very surprised if 'makeoptions WITH_CTF=yes' worked.  The many
> > times I and others have tried it it did not work.  Do you have a log of your
> > build showing the ctfconvert and ctfmerge command lines?
> 
> I do not have a log around, it has been a while since I did something  
> with dtrace (a year ago) and I can not remember that I always added  
> WITH_CTF on a build (but it was about SDT probes, not FBT probes, in  
> case it matters).
> 
> I had a look again, WITH_CTF=yes is one of the first lines in the  
> Makefile, and /usr/share/mk/sys.mk has "if !defined(WITH_CTF)". "make  
> -V WITH_CTF" shows "yes", but "make -V NO_CTF" shows "1". This is  
> strange, isn't it? I would expect that NO_CTF is undefined. Is this a  
> bug in make, or a bug in the man page (neither in the description of  
> the different kinds of variables, nor in the description of "defined"  
> is something mentioned explaining this behavior).

It is defined behavior.  From the 2nd and 3rd paragraphs of the make(1)
manual page:

 First of all, the initial list of specifications will be read from the
 system makefile, sys.mk, unless inhibited with the -r option.  The stan-
 dard sys.mk as shipped with FreeBSD also handles make.conf(5), the
 default path to which can be altered via the make variable __MAKE_CONF.

 Then the first of BSDmakefile, makefile, and Makefile that can be found
 in the current directory, object directory (see .OBJDIR), or search path
 (see the -I option) will be read for the main list of dependency specifi-
 cations.  A different makefile or list of them can be supplied via the -f
 option(s).  Finally, if the file .depend can be found in any of the
 aforesaid locations, it will also be read (see mkdep(1)).

From this you can see that sys.mk is included and parsed before 'Makefile',
so the WITH_CTF=yes is not set until after sys.mk has been parsed.

-- 
John Baldwin
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd does not re-query servers, when a new interface appears

2010-03-09 Thread Dominic Fandrey

On 09/03/2010 11:27, Ian Smith wrote:
> On Tue, 9 Mar 2010, Dominic Fandrey wrote:
>  > ntpd tracks interface updates, however it does not requery
>  > servers, when they occur. This was less than an hour ago,
>  > at my university, the notebook boots and is not connected
>  > to anything:
>  > 
>  >  9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd
>  >  9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 
> 0.0.0.0#123 Disabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 
> Disabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 
> 192.168.1.12#123 Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 
> Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 127.0.0.1#123 
> Enabled
>  >  9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for 
> interface updates
>  >  9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040
>  >  9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from 
> /var/db/ntpd.drift
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', giving 
> up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', giving 
> up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', giving 
> up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
> giving up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp1.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp1.rz.uni-karlsruhe.de', 
> giving up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp3.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp3.rz.uni-karlsruhe.de', 
> giving up on it
>  >  9 Mar 08:07:20 ntpd[1542]: host name not found: ntp4.rz.uni-karlsruhe.de
>  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `ntp4.rz.uni-karlsruhe.de', 
> giving up on it
>  > 
>  > So ntpd has given up on all the servers listed in the ntp.conf file.
> 
> Yes, but it looks more like name service that's not operating, ntpd 
> seems to be doing its best but can't resolve the hostnames?

Why would I have named running on a notebook? This is a notebook,
which is not connected to the internet.

>  > I then proceed to connect to the wireless network and proceed to log
>  > into two VPNs:
>  > 
>  >  9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, 
> 192.168.75.58#123 Enabled
>  >  9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, 
> 193.196.120.15#123 Enabled
>  >  9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 
> 141.3.162.67#123 Enabled
>  > 
>  > Over interface #8 some of the servers are actually available, but
>  > ntpq -p still states:
>  > No association ID's returned
>  > 
>  > Only when I restart ntpd, it operates as expected:
>  >  remote   refid  st t when poll reach   delay   offset  
> jitter
>  > 
> ==
>  >  zit-net2.uni-pa .STEP.  16 u-  51200.0000.000   
> 0.000
>  >  alpha.rueckgr.a .STEP.  16 u-  51200.0000.000   
> 0.000
>  >  ntp.goneco.de   .STEP.  16 u-  51200.0000.000   
> 0.000
>  > +proxy4.rz.uni-k 129.13.64.17 2 u   30  128  2712.9372.530   
> 1.891
>  > +proxy2.rz.uni-k 129.13.64.17 2 u   58  128  3753.593   -8.981   
> 1.837
>  > *proxy1.rz.uni-k 129.13.64.17 2 u   15  128  2713.2978.244   
> 1.487
> 
> I've always had to restart named after losing / regaining an interface, 
> most noticeably after a suspend/resume (eg a low battery suspend), so I 
> run /etc/rc.d/named restart from rc.resume.  This looks like a similar 
> issue perhaps, though I don't see why restarting only ntpd would fix it.

As I said, named doesn't run at all. When the notebook gets an
internet connection, ntpd recognizes this. It somehow doesn't
occur to it, though, that it might be able to resolve the
servers, now.

-- 
A: Because it fouls the order in which people normally read text.
Q: Why is top-posting such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail? 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Many processes stuck in zfs

2010-03-09 Thread Stefan Bethke

Am 09.03.2010 um 13:29 schrieb Pawel Jakub Dawidek:

> On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote:
>> Over the past couple of months, I've more or less regularly observed 
>> machines having more and more processes stuck in the zfs wchan.  The 
>> processes never recover from that, and trying to reboot only gets the entire 
>> system stuck, without any console messages.  I can enter the debugger, and I 
>> have saved a couple of dumps.
>> 
>> The situation seems to be triggered by zfs receive'ing snapshots from the 
>> sister machine (both synchronize their active ZFS filesystems to each other, 
>> using zfs send and zfs receive).  It appears it's the receiving causing 
>> trouble.
>> 
>> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, 
>> with ARC limited to 512M, prefetch and ZIL disabled via loader.conf.
>> 
>> What should I be looking at to further diagnose?
> 
> What kind of hardware do you have there? There is 3-way deadlock I've a
> fix for which would be hard to trigger on single or dual core machines.

FreeBSD lokschuppen.zs64.net 8.0-STABLE FreeBSD 8.0-STABLE #24: Sat Feb 13 
11:20:03 UTC 2010 r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT  
amd64
Copyrig
ht (c) 1992-2010 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.0-STABLE #24: Sat Feb 13 11:20:03 UTC 2010
r...@lokschuppen.zs64.net:/usr/obj/usr/src/sys/EISENBOOT amd64
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Duo CPU E7300  @ 2.66GHz (2666.65-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x10676  Stepping = 6
  Features=0xbfebfbff
  Features2=0x8e39d
  AMD Features=0x20100800
  AMD Features2=0x1
  TSC: P-state invariant
real memory  = 4294967296 (4096 MB)
avail memory = 4081422336 (3892 MB)


> Feel free to try the fix:
> 
>   http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch

I'll give it a shot on one of the two boxes.


Stefan

-- 
Stefan BethkeFon +49 151 14070811



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Many processes stuck in zfs

2010-03-09 Thread Pawel Jakub Dawidek

On Tue, Mar 09, 2010 at 01:57:07PM +0100, Borja Marcos wrote:
> 
> On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote:
> 
> > On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote:
> >> Over the past couple of months, I've more or less regularly observed 
> >> machines having more and more processes stuck in the zfs wchan.  The 
> >> processes never recover from that, and trying to reboot only gets the 
> >> entire system stuck, without any console messages.  I can enter the 
> >> debugger, and I have saved a couple of dumps.
> >> 
> >> The situation seems to be triggered by zfs receive'ing snapshots from the 
> >> sister machine (both synchronize their active ZFS filesystems to each 
> >> other, using zfs send and zfs receive).  It appears it's the receiving 
> >> causing trouble.
> >> 
> >> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, 
> >> with ARC limited to 512M, prefetch and ZIL disabled via loader.conf.
> >> 
> >> What should I be looking at to further diagnose?
> > 
> > What kind of hardware do you have there? There is 3-way deadlock I've a
> > fix for which would be hard to trigger on single or dual core machines.
> > 
> > Feel free to try the fix:
> > 
> > http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch
> 
> Maybe related to the deadlock I reported when I was receiving an incremental 
> snapshot while the target dataset was being read?

Could be. This deadlock is in general related to zfs recv functionality.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpWv81oJw3Zu.pgp
Description: PGP signature

Re: Many processes stuck in zfs

2010-03-09 Thread Borja Marcos


On Mar 9, 2010, at 1:29 PM, Pawel Jakub Dawidek wrote:

> On Tue, Mar 09, 2010 at 10:15:53AM +0100, Stefan Bethke wrote:
>> Over the past couple of months, I've more or less regularly observed 
>> machines having more and more processes stuck in the zfs wchan.  The 
>> processes never recover from that, and trying to reboot only gets the entire 
>> system stuck, without any console messages.  I can enter the debugger, and I 
>> have saved a couple of dumps.
>> 
>> The situation seems to be triggered by zfs receive'ing snapshots from the 
>> sister machine (both synchronize their active ZFS filesystems to each other, 
>> using zfs send and zfs receive).  It appears it's the receiving causing 
>> trouble.
>> 
>> Both machines run 8-stable from mid-February, with a single-disk ZFS pool, 
>> with ARC limited to 512M, prefetch and ZIL disabled via loader.conf.
>> 
>> What should I be looking at to further diagnose?
> 
> What kind of hardware do you have there? There is 3-way deadlock I've a
> fix for which would be hard to trigger on single or dual core machines.
> 
> Feel free to try the fix:
> 
>   http://people.freebsd.org/~pjd/patches/zfs_3way_deadlock.patch

Maybe related to the deadlock I reported when I was receiving an incremental 
snapshot while the target dataset was being read?





Borja.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: is dtrace usable?

2010-03-09 Thread Alexander Leidinger

Quoting John Baldwin  (from Tue, 9 Mar 2010 07:47:00 -0500):

On Tuesday 09 March 2010 3:27:09 am Alexander Leidinger wrote:
Quoting John Baldwin  (from Mon, 8 Mar 2010  
10:00:12 -0500):

> On Saturday 06 March 2010 11:00:12 am Robert Watson wrote:
>> On Sat, 6 Mar 2010, Alexander Leidinger wrote:
>>
>> >> Take a look at the DTrace configuration information here:
>> >>
>> >> 
http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dtrace.html

>> >
>> > I've just reread it (despite the fact that I already used it). Some
>> > comments:
>> >
>> > Last time I tried, I didn't see any problems by adding
>> >  makeoptions WITH_CTF=yes
>> > to the kernel config instead of doing
>> >  make WITH_CTF=1 kernel
>> >
>> > Did I miss something, and if not, shouldn't we tell about the
>> > makeoptions part instead (a kernel rebuild later will not cause
>> > trouble when someone forgets to do the WITH_CTF part as it is already
>> > in the kernel makefile)?
>>
>> I'll leave John to answer this one, CC line broadended.
>
> I would be very surprised if 'makeoptions WITH_CTF=yes' worked.  The many
> times I and others have tried it it did not work.  Do you have a  
log of your

> build showing the ctfconvert and ctfmerge command lines?

I do not have a log around, it has been a while since I did something
with dtrace (a year ago) and I can not remember that I always added
WITH_CTF on a build (but it was about SDT probes, not FBT probes, in
case it matters).

I had a look again, WITH_CTF=yes is one of the first lines in the
Makefile, and /usr/share/mk/sys.mk has "if !defined(WITH_CTF)". "make
-V WITH_CTF" shows "yes", but "make -V NO_CTF" shows "1". This is
strange, isn't it? I would expect that NO_CTF is undefined. Is this a
bug in make, or a bug in the man page (neither in the description of
the different kinds of variables, nor in the description of "defined"
is something mentioned explaining this behavior).

It is defined behavior.  From the 2nd and 3rd paragraphs of the make(1)
manual page:

 First of all, the initial list of specifications will be read from the
 system makefile, sys.mk, unless inhibited with the -r option.  The stan-
 dard sys.mk as shipped with FreeBSD also handles make.conf(5), the
 default path to which can be altered via the make variable __MAKE_CONF.

 Then the first of BSDmakefile, makefile, and Makefile that can be found
 in the current directory, object directory (see .OBJDIR), or search path
 (see the -I option) will be read for the main list of  
dependency specifi-
 cations.  A different makefile or list of them can be supplied  
via the -f

 option(s).  Finally, if the file .depend can be found in any of the
 aforesaid locations, it will also be read (see mkdep(1)).

From this you can see that sys.mk is included and parsed before 'Makefile',
so the WITH_CTF=yes is not set until after sys.mk has been parsed.

I think we need to find a different solution for this. The need to  
specify WITH_CTF at the command line is very error prone. :(

Bye,
Alexander.

--
Every time I lose weight, it finds me again!

http://www.Leidinger.netAlexander @ Leidinger.net: PGP ID = B0063FE7
http://www.FreeBSD.org   netchild @ FreeBSD.org  : PGP ID = 72077137
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd does not re-query servers, when a new interface appears

2010-03-09 Thread Jeremy Chadwick

On Tue, Mar 09, 2010 at 05:30:45AM -0800, Jeremy Chadwick wrote:
> On Tue, Mar 09, 2010 at 02:17:48PM +0100, Dominic Fandrey wrote:
> > On 09/03/2010 11:27, Ian Smith wrote:
> > > On Tue, 9 Mar 2010, Dominic Fandrey wrote:
> > >  > ntpd tracks interface updates, however it does not requery
> > >  > servers, when they occur. This was less than an hour ago,
> > >  > at my university, the notebook boots and is not connected
> > >  > to anything:
> > >  > 
> > >  >  9 Mar 08:07:17 ntpd[1510]: logging to file /var/log/ntpd
> > >  >  9 Mar 08:07:17 ntpd[1510]: precision = 2.234 usec
> > >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #0 wildcard, 
> > > 0.0.0.0#123 Disabled
> > >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #1 wildcard, ::#123 
> > > Disabled
> > >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #2 bge0, 
> > > 192.168.1.12#123 Enabled
> > >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #3 lo0, fe80::1#123 
> > > Enabled
> > >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #4 lo0, ::1#123 
> > > Enabled
> > >  >  9 Mar 08:07:17 ntpd[1510]: Listening on interface #5 lo0, 
> > > 127.0.0.1#123 Enabled
> > >  >  9 Mar 08:07:17 ntpd[1510]: Listening on routing socket on fd #26 for 
> > > interface updates
> > >  >  9 Mar 08:07:17 ntpd[1510]: kernel time sync status 2040
> > >  >  9 Mar 08:07:17 ntpd[1510]: frequency initialized 3.155 PPM from 
> > > /var/db/ntpd.drift
> > >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 0.de.pool.ntp.org
> > >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `0.de.pool.ntp.org', 
> > > giving up on it
> > >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 1.de.pool.ntp.org
> > >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `1.de.pool.ntp.org', 
> > > giving up on it
> > >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 2.de.pool.ntp.org
> > >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve `2.de.pool.ntp.org', 
> > > giving up on it
> > >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 
> > > ntp1.rz.uni-karlsruhe.de
> > >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve 
> > > `ntp1.rz.uni-karlsruhe.de', giving up on it
> > >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 
> > > ntp1.rz.uni-karlsruhe.de
> > >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve 
> > > `ntp1.rz.uni-karlsruhe.de', giving up on it
> > >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 
> > > ntp3.rz.uni-karlsruhe.de
> > >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve 
> > > `ntp3.rz.uni-karlsruhe.de', giving up on it
> > >  >  9 Mar 08:07:20 ntpd[1542]: host name not found: 
> > > ntp4.rz.uni-karlsruhe.de
> > >  >  9 Mar 08:07:20 ntpd[1542]: couldn't resolve 
> > > `ntp4.rz.uni-karlsruhe.de', giving up on it
> > >  > 
> > >  > So ntpd has given up on all the servers listed in the ntp.conf file.
> > > 
> > > Yes, but it looks more like name service that's not operating, ntpd 
> > > seems to be doing its best but can't resolve the hostnames?
> > 
> > Why would I have named running on a notebook? This is a notebook,
> > which is not connected to the internet.
> > 
> > >  > I then proceed to connect to the wireless network and proceed to log
> > >  > into two VPNs:
> > >  > 
> > >  >  9 Mar 08:08:58 ntpd[1510]: Listening on interface #6 wlan0, 
> > > 192.168.75.58#123 Enabled
> > >  >  9 Mar 08:09:00 ntpd[1510]: Listening on interface #7 tun0, 
> > > 193.196.120.15#123 Enabled
> > >  >  9 Mar 08:09:04 ntpd[1510]: Listening on interface #8 tun1, 
> > > 141.3.162.67#123 Enabled
> > >  > 
> > >  > Over interface #8 some of the servers are actually available, but
> > >  > ntpq -p still states:
> > >  > No association ID's returned
> > >  > 
> > >  > Only when I restart ntpd, it operates as expected:
> > >  >  remote   refid  st t when poll reach   delay   offset 
> > >  jitter
> > >  > 
> > > ==
> > >  >  zit-net2.uni-pa .STEP.  16 u-  51200.0000.000 
> > >   0.000
> > >  >  alpha.rueckgr.a .STEP.  16 u-  51200.0000.000 
> > >   0.000
> > >  >  ntp.goneco.de   .STEP.  16 u-  51200.0000.000 
> > >   0.000
> > >  > +proxy4.rz.uni-k 129.13.64.17 2 u   30  128  2712.9372.530 
> > >   1.891
> > >  > +proxy2.rz.uni-k 129.13.64.17 2 u   58  128  3753.593   -8.981 
> > >   1.837
> > >  > *proxy1.rz.uni-k 129.13.64.17 2 u   15  128  2713.2978.244 
> > >   1.487
> > > 
> > > I've always had to restart named after losing / regaining an interface, 
> > > most noticeably after a suspend/resume (eg a low battery suspend), so I 
> > > run /etc/rc.d/named restart from rc.resume.  This looks like a similar 
> > > issue perhaps, though I don't see why restarting only ntpd would fix it.
> > 
> > As I said, named doesn't run at all. When the notebook gets an
> > internet connection, ntpd recognizes this. It somehow doesn't
> > occur to it, though, that it might

Re: is dtrace usable?

2010-03-09 Thread Robert N. M. Watson

On Mar 9, 2010, at 2:16 PM, Alexander Leidinger wrote:

>> From this you can see that sys.mk is included and parsed before 'Makefile',
>> so the WITH_CTF=yes is not set until after sys.mk has been parsed.
> 
> I think we need to find a different solution for this. The need to specify 
> WITH_CTF at the command line is very error prone. :(

You are neither the first person to have made this observation, nor the first 
person to have failed to propose a solution in the form of a patch :-).

Robert___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ntpd does not re-query servers, when a new interface appears

2010-03-09 Thread Ian Smith

On Tue, 9 Mar 2010, Jeremy Chadwick wrote:
 > On Tue, Mar 09, 2010 at 09:27:35PM +1100, Ian Smith wrote:
[..]
 > > Yes, but it looks more like name service that's not operating, ntpd 
 > > seems to be doing its best but can't resolve the hostnames?

Right smell, wrong pooch :)  Thanks for the pointer to the ntp buglist.

 > > I've always had to restart named after losing / regaining an interface, 
 > > most noticeably after a suspend/resume (eg a low battery suspend), so I 
 > > run /etc/rc.d/named restart from rc.resume.  This looks like a similar 
 > > issue perhaps, though I don't see why restarting only ntpd would fix it.
 > 
 > named is supposed to auto-probe for interfaces at a specific interval;
 > see the "interface-interval" option.  I forget what the default is,
 > but on our servers we explicitly disable it by setting it to 0.

// We have no dynamic interfaces, so BIND shouldn't need to
// poll for interface state {UP|DOWN}.
// (will this fix need to reload after suspend/resume?)
interface-interval 0;

It's rare, maybe twice a year, that this laptop cum server suspends from 
a 2hr+ lack of power - inverter failures, rewiring etc - so restarting 
named on resume makes more sense than constant iface polling 'in case'.

cheers, Ian
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ZFS hot spares

2010-03-09 Thread Steve Polyack


On 03/09/10 05:11, Ivan Voras wrote:

On 03/08/10 19:06, Steve Polyack wrote:

ZFS in FreeBSD lacks at least one major feature from the Solaris
version: hot spares. There is a PR open at
http://www.freebsd.org/cgi/query-pr.cgi?pr=134491, but there hasn't been
any motion/thoughts posted on it since its creation almost one year ago.

I'm aware that on Solaris, hot spare replacement is handled by a few
Solaris-specific daemons, zfs-retire and zfs-diagnose, which both plug
into the Solaris FMA (Fault Management Architecture). Have there been
any thoughts on porting these over or getting something similar running
within FreeBSD? With all of the recent SATA/SAS CAM hotplug work now
committed, it would be nice to have automatic replacement of hot spares
with a future hot-replacement of the failed drive.

On the other side, I'd be interested in hearing if anyone has had
success in rolling their own scripted solution: i.e. something which
polls 'zpool status' looking for failed drives and performing hot-spare
replacements automatically.


You don't have to exactly poll it. See /etc/devd.conf:

# Sample ZFS problem reports handling.
notify 10 {
match "system"  "ZFS";
match "type""zpool";
action "logger -p kern.err 'ZFS: failed to load zpool $pool'";
};

notify 10 {
match "system"  "ZFS";
match "type""vdev";
action "logger -p kern.err 'ZFS: vdev failure, zpool=$pool 
type=$type'";

};

notify 10 {
match "system"  "ZFS";
match "type""data";
action "logger -p kern.warn 'ZFS: zpool I/O failure, 
zpool=$pool error=$zio_err'";

};

notify 10 {
match "system"  "ZFS";
match "type""io";
action "logger -p kern.warn 'ZFS: vdev I/O failure, 
zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size 
error=$zio_err'";

};

notify 10 {
match "system"  "ZFS";
match "type""checksum";
action "logger -p kern.warn 'ZFS: checksum mismatch, 
zpool=$pool path=$vdev_path offset=$zio_offset size=$zio_size'";

};

I don't really know if these notifications actually work since I don't 
have hot-plug test machines, but if they do, this looks like a decent 
starting point.




Thanks for the suggestions.  I received a similar one from someone 
else.  If I get time to build a ZFS lab machine then I will certainly 
try these out and provide feedback on how they work.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap

2010-03-09 Thread Greg Byshenk

On Tue, Mar 09, 2010 at 09:00:49AM +0800, Linghua Tseng wrote:
 
> Here is the output of `diff -u /usr/src/etc/nsswitch.conf 
> /etc/nsswitch.conf'.
> --- /usr/src/etc/nsswitch.conf  2010-03-08 09:04:25.0 +0800
> +++ /etc/nsswitch.conf  2010-03-08 18:01:08.0 +0800
> @@ -1,13 +1,13 @@
> #
> # nsswitch.conf(5) - name service switch configuration file
> -# $FreeBSD: src/etc/nsswitch.conf,v 1.1.10.1 2009/08/03 08:13:06 kensmith 
> Exp $
> +# $FreeBSD: src/etc/nsswitch.conf,v 1.1 2006/05/03 15:14:47 ume Exp $
> #
> group: compat
> -group_compat: nis
> +group_compat: ldap nis
> hosts: files dns
> networks: files
> passwd: compat
> -passwd_compat: nis
> +passwd_compat: ldap nis
> shells: files
> services: compat
> services_compat: nis
> 
> The line `+:*' has already put into /etc/master.passwd,
> and the line `+:*::' has already put into /etc/group.

I may be completely wrong (I can't seem to find the source), and I
don't know if it is the source of your problem, but I recall it being
reported that 'passwd_compat' and 'group_compat' require a *single*
source entry. 


-- 
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Cron output mail lost with update to RELENG_7

2010-03-09 Thread Kevin Oberman

> Date: Fri, 5 Mar 2010 12:33:09 -0800
> From: Jeremy Chadwick 
> Sender: owner-freebsd-sta...@freebsd.org
> 
> On Fri, Mar 05, 2010 at 11:32:47AM -0800, Kevin Oberman wrote:
> > I have discovered a problem with the mail sent by cron jobs (I refer
> > only to logs, not invocations of mail from scripts.) They never are
> > delivered.
> > Mar  5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient 
> > addresses found in message header
> > 
> > This showed up when I upgraded the system to RELENG_7 yesterday. My
> > previous install was RELENG_7 of May 2, 2009 and it delivered the logs
> > without any problems. No other changes were made. postfix was 2.6.5.
> > 
> > I have this same issue on all 8.0 systems I have, but I was blaming a
> > fault in postfix config. Now I realize that this is not the problem.
> > 
> > I really don't know quite where to look for this. Any clues would be
> > appreciated. 
> 
> I don't have this issue on any of our RELENG_7 or RELENG_8 systems, all
> of which use postfix and WITHOUT_SENDMAIL in /etc/src.conf.
> 
> It sounds like cron is trying to spawn something like mail(1) (more
> likely /usr/sbin/sendmail; would have to look at the code) and passing
> it either incorrect flags or actual content within the header itself,
> e.g. a missing To: line.
> 
> Since postfix is involved, have you verified your /etc/mail
> configuration to make sure mailwrapper is referring to the correct
> postfix binaries?
> 
> The only other thing I can think of would be, possibly, some sort of
> cronjob root has (either crontab -l or /etc/crontab) which makes use of
> the MAILTO environment variable.  See cron(8) for what I'm talking
> about.
> 
> You might have to run cron in debug mode (see -x flag; your argument
> list will probably be quite long :-) ) to see what it's doing.
> Otherwise truss or ktrace might be the only way to track down what's
> going on underneath.

After a lot of testing, I created a dummy sendmail that simply captured
the arguments and the data from STDIN. 
#!/usr/local/bin/perl
open OUT, ">/home/oberman/cronout.txt";
foreach (@ARGV) {print OUT "$_\n";}
print OUT "Mailcat ran!\n";
sleep 5;
while () { print OUT $_; }
close OUT;

It looks like cron is sending an empty message. I see MAILARG of
'-FCronDaemon -odi -oem -oi -t' but that is followed by EOF with no
content at all.

I'm looking at the cron source, but I am baffled for the moment. I see
no recent updates to cron in RELENG_7, though there are in RELENG_8. I'm
running out of ideas.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: ober...@es.net  Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Many processes stuck in zfs

2010-03-09 Thread Kevin Oberman

> Date: Tue, 9 Mar 2010 21:53:55 +1100
> From: Peter Jeremy 
> Sender: owner-freebsd-sta...@freebsd.org
> 
> On 2010-Mar-09 10:15:53 +0100, Stefan Bethke  wrote:
> >Over the past couple of months, I've more or less regularly observed 
> >machines having more and more processes stuck in the zfs wchan.  The 
> >processes never recover from that,
> 
> How long have you waited?
> 
> There seems to be a problem with low free memory handling that causes ZFS
> to turn into cold molasses.  The work-around is to run a program that
> allocates a decent size chunk of memory and then exits.  The original
> suggestion was something like:
>   perl -e '@x = (0) x 100;'
> I've written a short program that allocates and dirties ~100MB and then
> exits and run it from cron.

Sigh! I found it. I build my systems without NIS and I had the stock
nsswitch.conf file. Fixed.

/me banging my head against the desk.

Thanks!
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: ober...@es.net  Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Many processes stuck in zfs

2010-03-09 Thread Kevin Oberman

Sigh. My brain is fried. I replied to the wrong thread. Pleas ignore this.
-- 
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: ober...@es.net  Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751

> Date: Tue, 09 Mar 2010 16:18:25 -0800
> From: "Kevin Oberman" 
> Sender: owner-freebsd-sta...@freebsd.org
> 
> > Date: Tue, 9 Mar 2010 21:53:55 +1100
> > From: Peter Jeremy 
> > Sender: owner-freebsd-sta...@freebsd.org
> > 
> > On 2010-Mar-09 10:15:53 +0100, Stefan Bethke  wrote:
> > >Over the past couple of months, I've more or less regularly observed 
> > >machines having more and more processes stuck in the zfs wchan.  The 
> > >processes never recover from that,
> > 
> > How long have you waited?
> > 
> > There seems to be a problem with low free memory handling that causes ZFS
> > to turn into cold molasses.  The work-around is to run a program that
> > allocates a decent size chunk of memory and then exits.  The original
> > suggestion was something like:
> > perl -e '@x = (0) x 100;'
> > I've written a short program that allocates and dirties ~100MB and then
> > exits and run it from cron.
> 
> Sigh! I found it. I build my systems without NIS and I had the stock
> nsswitch.conf file. Fixed.
> 
> /me banging my head against the desk.
> 
> Thanks!
> -- 
> R. Kevin Oberman, Network Engineer
> Energy Sciences Network (ESnet)
> Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
> E-mail: ober...@es.netPhone: +1 510 486-8634
> Key fingerprint:059B 2DDF 031C 9BA3 14A4  EADA 927D EBB3 987B 3751
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Cron output mail lost with update to RELENG_7

2010-03-09 Thread Kevin Oberman

> Date: Fri, 5 Mar 2010 12:33:09 -0800
> From: Jeremy Chadwick 
> Sender: owner-freebsd-sta...@freebsd.org
> 
> On Fri, Mar 05, 2010 at 11:32:47AM -0800, Kevin Oberman wrote:
> > I have discovered a problem with the mail sent by cron jobs (I refer
> > only to logs, not invocations of mail from scripts.) They never are
> > delivered.
> > Mar  5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient 
> > addresses found in message header
> > 
> > This showed up when I upgraded the system to RELENG_7 yesterday. My
> > previous install was RELENG_7 of May 2, 2009 and it delivered the logs
> > without any problems. No other changes were made. postfix was 2.6.5.
> > 
> > I have this same issue on all 8.0 systems I have, but I was blaming a
> > fault in postfix config. Now I realize that this is not the problem.
> > 
> > I really don't know quite where to look for this. Any clues would be
> > appreciated. 
> 
> I don't have this issue on any of our RELENG_7 or RELENG_8 systems, all
> of which use postfix and WITHOUT_SENDMAIL in /etc/src.conf.
> 
> It sounds like cron is trying to spawn something like mail(1) (more
> likely /usr/sbin/sendmail; would have to look at the code) and passing
> it either incorrect flags or actual content within the header itself,
> e.g. a missing To: line.
> 
> Since postfix is involved, have you verified your /etc/mail
> configuration to make sure mailwrapper is referring to the correct
> postfix binaries?
> 
> The only other thing I can think of would be, possibly, some sort of
> cronjob root has (either crontab -l or /etc/crontab) which makes use of
> the MAILTO environment variable.  See cron(8) for what I'm talking
> about.
> 
> You might have to run cron in debug mode (see -x flag; your argument
> list will probably be quite long :-) ) to see what it's doing.
> Otherwise truss or ktrace might be the only way to track down what's
> going on underneath.
> 
> -- 
> | Jeremy Chadwick   j...@parodius.com |
> | Parodius Networking   http://www.parodius.com/ |
> | UNIX Systems Administrator  Mountain View, CA, USA |
> | Making life hard for others since 1977.  PGP: 4BD6C0CB |
> 
> ___
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 
> Date: Fri, 5 Mar 2010 12:33:09 -0800
> From: Jeremy Chadwick 
> Sender: owner-freebsd-sta...@freebsd.org
> 
> On Fri, Mar 05, 2010 at 11:32:47AM -0800, Kevin Oberman wrote:
> > I have discovered a problem with the mail sent by cron jobs (I refer
> > only to logs, not invocations of mail from scripts.) They never are
> > delivered.
> > Mar  5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:32:30 noc5 postfix/sendmail[1175]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient 
> > addresses found in message header
> > Mar  5 10:37:00 noc5 postfix/sendmail[1268]: fatal: root(0): No recipient 
> > addresses found in message header
> > 
> > This showed up when I upgraded the system to RELENG_7 yesterday. My
> > previous install was RELENG_7 of May 2, 2009 and it delivered the logs
> > without any problems. No other changes were made. postfix was 2.6.5.
> > 
> > I have this same issue on all 8.0 systems I have, but I was blaming a
> > fault in postfix config. Now I realize that this is not the problem.
> > 
> > I really don't know quite where to look for this. Any clues would be
> > appreciated. 
> 
> I don't have this issue on any of our RELENG_7 or RELENG_8 systems, all
> of which use postfix and WITHOUT_SENDMAIL in /etc/src.conf.
> 
> It sounds like cron is trying to spawn something like mail(1) (more
> likely /usr/sbin/sendmail; would have to look at the code) and passing
> it either incorrect flags or actual content within the header itself,
> e.g. a missing To: line.
> 
> Since postfix is involved, have you verified your /etc/mail
> configuration to make sure mailwrapper is referring to the correct
> postfix binaries?
> 
> The only other thing I can think of would be, possibly, some sort of
> cronjob root has (either crontab -l or /etc/crontab) which makes use of
> the MAILTO environment variable.  See cron(8) for what I'm talking
> about.
> 
> You might have to run cron in debug mode (see -x flag; your argument
> list will probably be quite long :-) ) to see what it's doing.
> Otherwise

ugen kernel module?

2010-03-09 Thread Михаил Кипа

If FreeBSD7 there was ugen.ko kernel module and I can use apcupsd with USB 
devices, but in FreeBSD there is no such module, how can I use APC power supply 
with usb interface (I mean usage of the apcupsd port)?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ugen kernel module?

2010-03-09 Thread Dan Nelson

In the last episode (Mar 10):
> In FreeBSD7 there was ugen.ko kernel module and I can use apcupsd with USB
> devices, but in FreeBSD there is no such module, how can I use APC power
> supply with usb interface (I mean usage of the apcupsd port)?

It's built into the usb subsystem now.  All USB devices (including USB hubs
and devices controlled by other drivers) now have a ugen device.  Try
running "usbconfig list" to show them.  I bet your UPS has just moved to a
different ugen number.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap

2010-03-09 Thread Linghua Tseng


Thanks.

I have tried to modify my /etc/nsswitch.conf to:

group: compat
group_compat: ldap
hosts: files dns
networks: files
passwd: compat
passwd_compat: ldap
shells: files
services: compat
services_compat: nis
protocols: files
rpc: files

But the problem is still occurred.

--
From: "Greg Byshenk" 
Sent: Wednesday, March 10, 2010 3:11 AM
To: "Linghua Tseng" 
Cc: "Peter C. Lai" ; 
Subject: Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap


On Tue, Mar 09, 2010 at 09:00:49AM +0800, Linghua Tseng wrote:

Here is the output of `diff -u /usr/src/etc/nsswitch.conf 
/etc/nsswitch.conf'.

--- /usr/src/etc/nsswitch.conf  2010-03-08 09:04:25.0 +0800
+++ /etc/nsswitch.conf  2010-03-08 18:01:08.0 +0800
@@ -1,13 +1,13 @@
#
# nsswitch.conf(5) - name service switch configuration file
-# $FreeBSD: src/etc/nsswitch.conf,v 1.1.10.1 2009/08/03 08:13:06 kensmith 
Exp $

+# $FreeBSD: src/etc/nsswitch.conf,v 1.1 2006/05/03 15:14:47 ume Exp $
#
group: compat
-group_compat: nis
+group_compat: ldap nis
hosts: files dns
networks: files
passwd: compat
-passwd_compat: nis
+passwd_compat: ldap nis
shells: files
services: compat
services_compat: nis

The line `+:*' has already put into /etc/master.passwd,
and the line `+:*::' has already put into /etc/group.


I may be completely wrong (I can't seem to find the source), and I
don't know if it is the source of your problem, but I recall it being
reported that 'passwd_compat' and 'group_compat' require a *single*
source entry. 



--
greg byshenk  -  gbysh...@byshenk.net  -  Leiden, NL



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

ntpd does not re-query servers, when a new interface appears

Re: is dtrace usable?

Re: Survey results very helpful, thanks! (was: Re: net.inet.tcp.timer_race: does anyone have a non-zero value?)

Fwd: Re: NFS Client error

Re: Fwd: Re: NFS Client error

Many processes stuck in zfs

Re: ZFS hot spares

Re: Fwd: Re: NFS Client error

Re: ntpd does not re-query servers, when a new interface appears

Re: Many processes stuck in zfs

Re: ntpd does not re-query servers, when a new interface appears

Re: Many processes stuck in zfs

freebsd 7.2stable em0: discard frame w/o packet header

Re: Many processes stuck in zfs

Re: ntpd does not re-query servers, when a new interface appears

Re: Many processes stuck in zfs

Re: Many processes stuck in zfs

Re: is dtrace usable?

Re: ntpd does not re-query servers, when a new interface appears

Re: Many processes stuck in zfs

Re: Many processes stuck in zfs

Re: Many processes stuck in zfs

Re: is dtrace usable?

Re: ntpd does not re-query servers, when a new interface appears

Re: is dtrace usable?

Re: ntpd does not re-query servers, when a new interface appears

Re: ZFS hot spares

Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap

Re: Cron output mail lost with update to RELENG_7

Re: Many processes stuck in zfs

Re: Many processes stuck in zfs

Re: Cron output mail lost with update to RELENG_7

ugen kernel module?

Re: ugen kernel module?

Re: Supplementary groups on LDAP cannot work with RELENG_8 +nss_ldap

35 matches

Site Navigation

Mail list logo

Footer information