Re: kernel's make fails in ath module, stable9

2013-01-15 Thread Jakub Lach
See tinderbox log, my -wirelesss mail etc.

I'm waiting for a fix too.



--
View this message in context: 
http://freebsd.1045724.n5.nabble.com/kernel-s-make-fails-in-ath-module-stable9-tp579p5777859.html
Sent from the freebsd-stable mailing list archive at Nabble.com.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kernel's make fails in ath module, stable9

2013-01-15 Thread Jakub Lach
...should be fixed, as it is already reverted.



--
View this message in context: 
http://freebsd.1045724.n5.nabble.com/kernel-s-make-fails-in-ath-module-stable9-tp579p5777864.html
Sent from the freebsd-stable mailing list archive at Nabble.com.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: kernel's make fails in ath module, stable9

2013-01-15 Thread Dimitry Andric

On 2013-01-15 12:31, Jakub Lach wrote:

...should be fixed, as it is already reverted.


Yes, sorry about that breakage.  It should be fixed as of r245449.

The good news is that stable/9 now has clang 3.2 release. :-)
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPv6 Tunnel Shared With Jails via epair Devices

2013-01-15 Thread Shawn Webb
On Tue, Jan 15, 2013 at 12:29 AM, Ben Morrow  wrote:

> Quoth Shawn Webb :
> >
> > I've been working on sharing a 6in4 IPv6 tunnel (via a gif device) I have
> > with Hurricane Electric (tunnelbroker.net) to my jails via epair
> devices.
> > My setup is a bit unique in that the IPv6 tunnel is behind an OpenVPN
> > connection. I've had varying degrees of success. I might have a bug to
> > report, but I thought I'd post here to get input from people who know
> > better than I do about these kinds of things.
> >
> > I have a bridge device (we'll call it bridge0) with a /64 IPv6 address
> > (2001:470:8142:1::1). Each jail's epair[n]b device will get an IPv6
> address
> > in that same prefix. For example, one of my jails is 2001:470:8142:1::3.
> > The default IPv6 gateway is the IPv6 address of bridge0.
> >
> > Giving one jail an IP address works fine. For each jail after that, the
> > IPv6 address stays in tentative mode. FreeBSD gets stuck trying to use
> DAD
> > to figure out if there's an address conflict. It never leaves tentative
> > mode. This is the bug I'm working out.
> >
> > Here's bridge0's config:
> >
> > # ifconfig bridge0
> > bridge0: flags=8843 metric 0 mtu
> > 1500
> > ether 02:fe:21:34:d3:00
> > inet6 2001:470:8142:1::1 prefixlen 64
> > nd6 options=21
> > id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
> > maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
> > root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
> > member: epair0a flags=143
> >ifmaxaddr 0 port 19 priority 128 path cost 2000
> > member: epair1a flags=143
> >ifmaxaddr 0 port 21 priority 128 path cost 2000
> > member: bge0 flags=143
> >ifmaxaddr 0 port 5 priority 128 path cost 20
>
> Why have you added the physical interface to the bridge? AFAICT you
> don't need to: a bridge will bridge epairs just fine, and as you
> explained in that blog post you have to route rather than bridge into
> the tunnel, since the tunnel isn't an Ethernet device.
>

I did it so that I have an IPv4 address directly on the LAN for each of my
jails.


>
> > Here's the relevant epair device for the jail whose IPv6 stack is
> working:
> >
> > # jexec "ClamAV_Dev" ifconfig epair1b
> > epair1b: flags=8843 metric 0 mtu
> > 1500
> > options=8
> > ether 02:fb:c0:00:16:0b
> > inet6 2001:470:8142:1::3 prefixlen 64
> > inet6 fe80::fb:c0ff:fe00:160b%epair1b prefixlen 64 scopeid 0x2
> > inet 10.7.1.172 netmask 0xfe00 broadcast 10.7.1.255
> > nd6 options=21
> > media: Ethernet 10Gbase-T (10Gbase-T )
> > status: active
> >
> > Here's the relevant epair device for the jail whose IPv6 stack isn't
> > working:
> >
> > # jexec "Dev Template" ifconfig epair0b
> > epair0b: flags=8843 metric 0 mtu
> > 1500
> > options=8
> > ether 02:80:03:00:14:0b
> > inet6 2001:470:8142:1::5 prefixlen 64 tentative
> > inet6 fe80::80:3ff:fe00:140b%epair0b prefixlen 64 tentative scopeid 0x2
> > inet 10.7.1.92 netmask 0xfe00 broadcast 10.7.1.255
> > nd6 options=29
>
> I suspect the addresses are only marked tentative because the interface
> has been marked IFDISABLED. This causes all current addresses to be
> marked tentative, because the kernel isn't allowed to send or receive
> IPv6 packets and so can't defend the addresses any more.
>
> Is it possible something in the jail's startup scripts is causing the
> interface to be marked IFDISABLED after the inet6 address has been
> assigned? Some of the functions in network.subr mark interfaces
> IFDISABLED automatically if they don't think they have IPv6 addresses.
>

I was thinking the same thing. One problem is that I can't remove the
IFDISABLED flag. This is what happens when I try:

# jexec "Dev Template" ifconfig epair0b -ifdisabled
ifconfig: ioctl(SIOCGIFINFO_IN6): Invalid argument


>
> > media: Ethernet 10Gbase-T (10Gbase-T )
> > status: active
> >
> > I brought up the "Dev Template" jail after bringing up the ClamAV_Dev
> jail.
> > If there's any other output you'd like to see, let me know. If you're
> > confused about my setup, visit my blog post about the subject here:
> >
> http://0xfeedface.org/blog/lattera/2013-01-12/tunneled-ipv6-freebsd-jails
> >
> > I'm curious to know if I've got a legit bug or if it's something I'm
> doing
> > wrong. The one thing I haven't tried is setting up rtadvd on the bridge.
> > That'd be kindof interesting, since my physical NIC is a member on the
> > bridge. I'd rather not dish out IPv6 addresses for all devices on the
> > network (a network with lots of devices I don't own or control).
>
> As I said, I don't believe you need the physical interface on the
> bridge, unless you have to for IPv4 (and you can't route or proxyarp
> instead). However, before you can run rtadvd you will need to give the
> bridge its proper link-local address, which probably also means locking
> down its hardware address in rc.conf. Bridges don't get auto link-local
> addresses, for reasons I've never entirely understood, and RAs have to
> use ll addresses.
>
> You wil

Re: make release doesn't correctly include EXTLOCALDIR ?

2013-01-15 Thread Fleuriot Damien
On Jan 11, 2013, at 2:06 PM, Fleuriot Damien  wrote:

> Hello list,
> 
> 
> I'm running 8.3-stable r245223 from a mere 2 days ago and am in the process 
> of building a custom release for our internal use as preconfigured firewalls.
> 
> "make release" works pretty fine except for a few quirks here and there.
> 
> 
> 
> First of all, I have set EXTLOCALDIR so that the release contains my existing 
> /usr/local/ , and thus the collection of installed ports.
> 
> The problem here is that while /release/usr/local/ is correctly populated, 
> the ISO images and ftp install directory have an empty usr/local/
> Extracting the ISO's base.?? files doesn't yield the /usr/local/ contents 
> either.
> 
> 
> 
> 
> The second problem I encounter is with the kernel's build.
> Apparently "make release" doesn't pull MODULES_OVERRIDE from /etc/make.conf 
> and decides to build every single module, as opposed to my own restricted 
> list.
> 
> I'm going to try with with KERNEL_FLAGS=-DMODULES_OVERRIDE module1 module2 in 
> /usr/src/release/Makefile
> 
> 
> 
> Has anyone else ever experienced the same problem regarding the inclusion of 
> /usr/local/ in their release ?
> 


Reposting to -stable in the hope of getting feedback, having received none on 
-questions.


Has anyone experienced this before ?
Is this intended behaviour ?

I fail to see the purpose of including /usr/local/ if it won't be packaged into 
the release images.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPv6 Tunnel Shared With Jails via epair Devices

2013-01-15 Thread Ben Morrow
Quoth Shawn Webb :
> On Tue, Jan 15, 2013 at 12:29 AM, Ben Morrow  wrote:
> > Quoth Shawn Webb :
> > >
> > > # ifconfig bridge0
> > > bridge0: flags=8843 metric 0 mtu
> > > 1500
> > > ether 02:fe:21:34:d3:00
> > > inet6 2001:470:8142:1::1 prefixlen 64
> > > nd6 options=21
> > > id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
> > > maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
> > > root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
> > > member: epair0a flags=143
> > >ifmaxaddr 0 port 19 priority 128 path cost 2000
> > > member: epair1a flags=143
> > >ifmaxaddr 0 port 21 priority 128 path cost 2000
> > > member: bge0 flags=143
> > >ifmaxaddr 0 port 5 priority 128 path cost 20
> >
> > Why have you added the physical interface to the bridge? AFAICT you
> > don't need to: a bridge will bridge epairs just fine, and as you
> > explained in that blog post you have to route rather than bridge into
> > the tunnel, since the tunnel isn't an Ethernet device.
> 
> I did it so that I have an IPv4 address directly on the LAN for each of my
> jails.

Hmm, OK. 

> > > # jexec "Dev Template" ifconfig epair0b
> > > epair0b: flags=8843 metric 0 mtu
> > > 1500
> > > options=8
> > > ether 02:80:03:00:14:0b
> > > inet6 2001:470:8142:1::5 prefixlen 64 tentative
> > > inet6 fe80::80:3ff:fe00:140b%epair0b prefixlen 64 tentative scopeid 0x2
> > > inet 10.7.1.92 netmask 0xfe00 broadcast 10.7.1.255
> > > nd6 options=29
> >
> > I suspect the addresses are only marked tentative because the interface
> > has been marked IFDISABLED. This causes all current addresses to be
> > marked tentative, because the kernel isn't allowed to send or receive
> > IPv6 packets and so can't defend the addresses any more.
> >
> > Is it possible something in the jail's startup scripts is causing the
> > interface to be marked IFDISABLED after the inet6 address has been
> > assigned? Some of the functions in network.subr mark interfaces
> > IFDISABLED automatically if they don't think they have IPv6 addresses.
> 
> I was thinking the same thing. One problem is that I can't remove the
> IFDISABLED flag. This is what happens when I try:
> 
> # jexec "Dev Template" ifconfig epair0b -ifdisabled
> ifconfig: ioctl(SIOCGIFINFO_IN6): Invalid argument

ifconfig epair0b inet6 -ifdisabled

I don't know why you get that error when you miss out the 'inet6'; it's
not exactly very clear.

Ben

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE]

2013-01-15 Thread olivier
Dear All,
Still experiencing the same hangs I reported earlier with 9.1. I've been
running a kernel with WITNESS enabled to provide more information.

During an occurrence of the hang, running show alllocks gave

Process 25777 (sysctl) thread 0xfe014c5b2920 (102567)
exclusive sleep mutex Giant (Giant) r = 0 (0x811e34c0) locked @
/usr/src/sys/dev/usb/usb_transfer.c:3171
Process 25750 (sshd) thread 0xfe015a688000 (104313)
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfe0204e0bb98) locked @
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 24922 (cnid_dbd) thread 0xfe0187ac4920 (103597)
shared lockmgr zfs (zfs) r = 0 (0xfe0973062488) locked @
/usr/src/sys/kern/vfs_syscalls.c:3591
Process 24117 (sshd) thread 0xfe07bd914490 (104195)
exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfe0204e0a8f0) locked @
/usr/src/sys/kern/uipc_sockbuf.c:148
Process 1243 (java) thread 0xfe01ca85d000 (102704)
exclusive sleep mutex pmap (pmap) r = 0 (0xfe015aec1440) locked @
/usr/src/sys/amd64/amd64/pmap.c:4840
exclusive rw pmap pv global (pmap pv global) r = 0 (0x81409780)
locked @ /usr/src/sys/amd64/amd64/pmap.c:4802
exclusive sleep mutex vm page (vm page) r = 0 (0x813f0a80) locked @
/usr/src/sys/vm/vm_object.c:1128
exclusive sleep mutex vm object (standard object) r = 0
(0xfe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076
shared sx vm map (user) (vm map (user)) r = 0 (0xfe015aec1388) locked @
/usr/src/sys/vm/vm_map.c:2045
Process 994 (nfsd) thread 0xfe015a0df000 (102426)
shared lockmgr zfs (zfs) r = 0 (0xfe0c3b505878) locked @
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760
Process 994 (nfsd) thread 0xfe015a0f8490 (102422)
exclusive lockmgr zfs (zfs) r = 0 (0xfe02db3b3e60) locked @
/usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760
Process 931 (syslogd) thread 0xfe015af18920 (102365)
shared lockmgr zfs (zfs) r = 0 (0xfe0141dd6680) locked @
/usr/src/sys/kern/vfs_syscalls.c:3591
Process 22 (syncer) thread 0xfe0125077000 (100279)
exclusive lockmgr syncer (syncer) r = 0 (0xfe015a2ff680) locked @
/usr/src/sys/kern/vfs_subr.c:1809

I don't have full "show lockedvnods" output because the output does not get
captured by ddb after using "capture on", it doesn't fit on a single
screen, and doesn't get piped into a "more" equivalent. What I did manage
to get (copied by hand, typos possible) is:

0xfe0c3b5057e0: 0xfe0c3b5057e0: tag zfs, type VREG
tag zfs, type VREG
usecount 1, writecount 0, refcount 1 mountedhere 0
usecount 1, writecount 0, refcount 1 mountedhere 0
flags (VI_ACTIVE)
flags (VI_ACTIVE)
v_object 0xfe089bc1b828 ref 0 pages 0
v_object 0xfe089bc1b828 ref 0 pages 0
lock type zfs: SHARED (count 1)
lock type zfs: SHARED (count 1)

0xfe02db3b3dc8: 0xfe02db3b3dc8: tag zfs, type VREG
tag zfs, type VREG
usecount 6, writecount 0, refcount 6 mountedhere 0
usecount 6, writecount 0, refcount 6 mountedhere 0
flags (VI_ACTIVE)
flags (VI_ACTIVE)
v_object 0xfe0b79583ae0 ref 0 pages 0
v_object 0xfe0b79583ae0 ref 0 pages 0
lock type zfs: EXCL by thread 0xfe015a0f8490 (pid 994)
lock type zfs: EXCL by thread 0xfe015a0f8490 (pid 994)
with exclusive waiters pending
with exclusive waiters pending

The output of show witness is at http://pastebin.com/eSRb3FEu

The output of alltrace is at http://pastebin.com/X1LruNrf (a number of
threads are stuck in zio_wait, none I can find in zio_interrupt, and
according to gstat and disks eventually going to sleep all disk IO seems to
be stuck for good; I think Andriy explained earlier that these criteria
might indicate this is a ZFS hang).

The output of show geom is at http://pastebin.com/6nwQbKr4

The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts are
occurring at a normal rate during the hang, as far as I can tell.

Any help would be greatly appreciated.
Thanks
Olivier
PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci
from 9.0 (in the hope it would fix the hangs I was experiencing in plain
9-STABLE; obviously the hangs are still occurring). The rest of my
configuration is the same as posted earlier.

On Mon, Dec 24, 2012 at 9:42 PM, olivier  wrote:

> Dear All
> It turns out that reverting to an older version of the mps driver did not
> fix the ZFS hangs I've been struggling with in 9.1 and 9-STABLE after all
> (they just took a bit longer to occur again, possibly just by chance). I
> followed steps along lines suggested by Andriy to collect more information
> when the problem occurs. Hopefully this will help figure out what's going
> on.
>
> As far as I can tell, what happens is that at some point IO operations to
> a bunch of drives that belong to different pools get stuck. For these
> drives, gstat shows no activity but 1 pending operation, as such:
>
>  L(q)  ops/sr/s   kBps   ms/rw/s   kBps   ms/wd/s   kBps
> ms/d   %busy Name
> 1   

Re: IPv6 Tunnel Shared With Jails via epair Devices

2013-01-15 Thread Shawn Webb
On Tue, Jan 15, 2013 at 2:54 PM, Ben Morrow  wrote:

> Quoth Shawn Webb :
> > On Tue, Jan 15, 2013 at 12:29 AM, Ben Morrow  wrote:
> > > Quoth Shawn Webb :
> > > >
> > > > # ifconfig bridge0
> > > > bridge0: flags=8843 metric 0
> mtu
> > > > 1500
> > > > ether 02:fe:21:34:d3:00
> > > > inet6 2001:470:8142:1::1 prefixlen 64
> > > > nd6 options=21
> > > > id 00:00:00:00:00:00 priority 32768 hellotime 2 fwddelay 15
> > > > maxage 20 holdcnt 6 proto rstp maxaddr 2000 timeout 1200
> > > > root id 00:00:00:00:00:00 priority 32768 ifcost 0 port 0
> > > > member: epair0a flags=143
> > > >ifmaxaddr 0 port 19 priority 128 path cost 2000
> > > > member: epair1a flags=143
> > > >ifmaxaddr 0 port 21 priority 128 path cost 2000
> > > > member: bge0 flags=143
> > > >ifmaxaddr 0 port 5 priority 128 path cost 20
> > >
> > > Why have you added the physical interface to the bridge? AFAICT you
> > > don't need to: a bridge will bridge epairs just fine, and as you
> > > explained in that blog post you have to route rather than bridge into
> > > the tunnel, since the tunnel isn't an Ethernet device.
> >
> > I did it so that I have an IPv4 address directly on the LAN for each of
> my
> > jails.
>
> Hmm, OK.
>
> > > > # jexec "Dev Template" ifconfig epair0b
> > > > epair0b: flags=8843 metric 0
> mtu
> > > > 1500
> > > > options=8
> > > > ether 02:80:03:00:14:0b
> > > > inet6 2001:470:8142:1::5 prefixlen 64 tentative
> > > > inet6 fe80::80:3ff:fe00:140b%epair0b prefixlen 64 tentative scopeid
> 0x2
> > > > inet 10.7.1.92 netmask 0xfe00 broadcast 10.7.1.255
> > > > nd6 options=29
> > >
> > > I suspect the addresses are only marked tentative because the interface
> > > has been marked IFDISABLED. This causes all current addresses to be
> > > marked tentative, because the kernel isn't allowed to send or receive
> > > IPv6 packets and so can't defend the addresses any more.
> > >
> > > Is it possible something in the jail's startup scripts is causing the
> > > interface to be marked IFDISABLED after the inet6 address has been
> > > assigned? Some of the functions in network.subr mark interfaces
> > > IFDISABLED automatically if they don't think they have IPv6 addresses.
> >
> > I was thinking the same thing. One problem is that I can't remove the
> > IFDISABLED flag. This is what happens when I try:
> >
> > # jexec "Dev Template" ifconfig epair0b -ifdisabled
> > ifconfig: ioctl(SIOCGIFINFO_IN6): Invalid argument
>
> ifconfig epair0b inet6 -ifdisabled
>
> I don't know why you get that error when you miss out the 'inet6'; it's
> not exactly very clear.
>

Ah. That works. I'll just have to add that to my scripts. Since the device
won't come out of tentative mode without manually removing the ifdisabled
flag, should I go ahead and file a PR? It'd be nice if I could at the very
least set a timeout for DAD.


>
> Ben
>
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPv6 Tunnel Shared With Jails via epair Devices

2013-01-15 Thread Shawn Webb
Somehow there ended up a typo in the CC to freebsd-stable@freebsd.org. Last
email below:

On Tue, Jan 15, 2013 at 5:53 PM, Shawn Webb  wrote:

> On Tue, Jan 15, 2013 at 4:52 PM, Ben Morrow  wrote:
>
>> Quoth Shawn Webb :
>> > On Tue, Jan 15, 2013 at 2:54 PM, Ben Morrow  wrote:
>> > >
>> > > ifconfig epair0b inet6 -ifdisabled
>> > >
>> > > I don't know why you get that error when you miss out the 'inet6';
>> it's
>> > > not exactly very clear.
>> > >
>> >
>> > Ah. That works. I'll just have to add that to my scripts. Since the
>> device
>> > won't come out of tentative mode without manually removing the
>> ifdisabled
>> > flag, should I go ahead and file a PR? It'd be nice if I could at the
>> very
>> > least set a timeout for DAD.
>>
>> DAD already has a timeout: it succeeds iff no packets indicating someone
>> else is using the address are received in a given time. The only reason
>> for an address remaining tentative indefinitely (without transitioning
>> to either valid or duplicated) is if IPv6 on that interface has been
>> disable entirely by setting IFDISABLED. If DAD fails for the LL address
>> the interface is marked IFDISABLED but the LL address is marked
>> duplicated rather than tentative.
>>
>
> I figured it out. In my jail initialization scripts, I'm running '/bin/sh
> /bin/rc' after doing initial network setup. The rc script puts the
> interface in IFDISABLED mode. So if I run the ifconfig command to remove
> the flag, I'm golden. I've committed and pushed the code that fixes the
> problem in my scripts. If you're curious, you can look at
> https://github.com/lattera/drupal-jailadmin/commit/cbf8509712c3dd237bbc020f49f63b51507b7be4
>
> Thanks for the help. I really appreciate it.
>
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: IPv6 Tunnel Shared With Jails via epair Devices

2013-01-15 Thread Ben Morrow
At  5PM -0500 on 15/01/13 you (Shawn Webb) wrote:
> 
> I figured it out. In my jail initialization scripts, I'm running '/bin/sh
> /bin/rc' after doing initial network setup. The rc script puts the
> interface in IFDISABLED mode. So if I run the ifconfig command to remove
> the flag, I'm golden.

Yes, that's what I thought. You should be able to avoid this by
specifying either

ifconfig_epair0b_ipv6="inet6 auto_linklocal"

or

ipv6_activate_all_interfaces="YES"

in the jail's rc.conf. This is cleaner than running ifconfig explicitly
outside the jail.

Ben

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE]

2013-01-15 Thread Reed A. Cartwright
I don't know if this is relevant or not, but I deadlock was recently
fixed in the VFS code:

http://svnweb.freebsd.org/base?view=revision&revision=244795

On Tue, Jan 15, 2013 at 12:55 PM, olivier  wrote:
> Dear All,
> Still experiencing the same hangs I reported earlier with 9.1. I've been
> running a kernel with WITNESS enabled to provide more information.
>
> During an occurrence of the hang, running show alllocks gave
>
> Process 25777 (sysctl) thread 0xfe014c5b2920 (102567)
> exclusive sleep mutex Giant (Giant) r = 0 (0x811e34c0) locked @
> /usr/src/sys/dev/usb/usb_transfer.c:3171
> Process 25750 (sshd) thread 0xfe015a688000 (104313)
> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfe0204e0bb98) locked @
> /usr/src/sys/kern/uipc_sockbuf.c:148
> Process 24922 (cnid_dbd) thread 0xfe0187ac4920 (103597)
> shared lockmgr zfs (zfs) r = 0 (0xfe0973062488) locked @
> /usr/src/sys/kern/vfs_syscalls.c:3591
> Process 24117 (sshd) thread 0xfe07bd914490 (104195)
> exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfe0204e0a8f0) locked @
> /usr/src/sys/kern/uipc_sockbuf.c:148
> Process 1243 (java) thread 0xfe01ca85d000 (102704)
> exclusive sleep mutex pmap (pmap) r = 0 (0xfe015aec1440) locked @
> /usr/src/sys/amd64/amd64/pmap.c:4840
> exclusive rw pmap pv global (pmap pv global) r = 0 (0x81409780)
> locked @ /usr/src/sys/amd64/amd64/pmap.c:4802
> exclusive sleep mutex vm page (vm page) r = 0 (0x813f0a80) locked @
> /usr/src/sys/vm/vm_object.c:1128
> exclusive sleep mutex vm object (standard object) r = 0
> (0xfe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076
> shared sx vm map (user) (vm map (user)) r = 0 (0xfe015aec1388) locked @
> /usr/src/sys/vm/vm_map.c:2045
> Process 994 (nfsd) thread 0xfe015a0df000 (102426)
> shared lockmgr zfs (zfs) r = 0 (0xfe0c3b505878) locked @
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760
> Process 994 (nfsd) thread 0xfe015a0f8490 (102422)
> exclusive lockmgr zfs (zfs) r = 0 (0xfe02db3b3e60) locked @
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760
> Process 931 (syslogd) thread 0xfe015af18920 (102365)
> shared lockmgr zfs (zfs) r = 0 (0xfe0141dd6680) locked @
> /usr/src/sys/kern/vfs_syscalls.c:3591
> Process 22 (syncer) thread 0xfe0125077000 (100279)
> exclusive lockmgr syncer (syncer) r = 0 (0xfe015a2ff680) locked @
> /usr/src/sys/kern/vfs_subr.c:1809
>
> I don't have full "show lockedvnods" output because the output does not get
> captured by ddb after using "capture on", it doesn't fit on a single
> screen, and doesn't get piped into a "more" equivalent. What I did manage
> to get (copied by hand, typos possible) is:
>
> 0xfe0c3b5057e0: 0xfe0c3b5057e0: tag zfs, type VREG
> tag zfs, type VREG
> usecount 1, writecount 0, refcount 1 mountedhere 0
> usecount 1, writecount 0, refcount 1 mountedhere 0
> flags (VI_ACTIVE)
> flags (VI_ACTIVE)
> v_object 0xfe089bc1b828 ref 0 pages 0
> v_object 0xfe089bc1b828 ref 0 pages 0
> lock type zfs: SHARED (count 1)
> lock type zfs: SHARED (count 1)
>
> 0xfe02db3b3dc8: 0xfe02db3b3dc8: tag zfs, type VREG
> tag zfs, type VREG
> usecount 6, writecount 0, refcount 6 mountedhere 0
> usecount 6, writecount 0, refcount 6 mountedhere 0
> flags (VI_ACTIVE)
> flags (VI_ACTIVE)
> v_object 0xfe0b79583ae0 ref 0 pages 0
> v_object 0xfe0b79583ae0 ref 0 pages 0
> lock type zfs: EXCL by thread 0xfe015a0f8490 (pid 994)
> lock type zfs: EXCL by thread 0xfe015a0f8490 (pid 994)
> with exclusive waiters pending
> with exclusive waiters pending
>
> The output of show witness is at http://pastebin.com/eSRb3FEu
>
> The output of alltrace is at http://pastebin.com/X1LruNrf (a number of
> threads are stuck in zio_wait, none I can find in zio_interrupt, and
> according to gstat and disks eventually going to sleep all disk IO seems to
> be stuck for good; I think Andriy explained earlier that these criteria
> might indicate this is a ZFS hang).
>
> The output of show geom is at http://pastebin.com/6nwQbKr4
>
> The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts are
> occurring at a normal rate during the hang, as far as I can tell.
>
> Any help would be greatly appreciated.
> Thanks
> Olivier
> PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci
> from 9.0 (in the hope it would fix the hangs I was experiencing in plain
> 9-STABLE; obviously the hangs are still occurring). The rest of my
> configuration is the same as posted earlier.
>
> On Mon, Dec 24, 2012 at 9:42 PM, olivier  wrote:
>
>> Dear All
>> It turns out that reverting to an older version of the mps driver did not
>> fix the ZFS hangs I've been struggling with in 9.1 and 9-STABLE after all
>> (they just took a bit longer to occur again, possibly just by chance). I
>> followed steps along lines suggested by Andriy to collect more information
>> when the probl

Re: CAM hangs in 9-STABLE? [Was: NFS/ZFS hangs after upgrading from 9.0-RELEASE to -STABLE]

2013-01-15 Thread olivier
My understanding is that the locks (and pieces of kernel code) involved are
different.
Maybe someone more knowledgeable than I am can comment.
Thanks for the suggestion...
Olivier


On Tue, Jan 15, 2013 at 4:07 PM, Reed A. Cartwright wrote:

> I don't know if this is relevant or not, but I deadlock was recently
> fixed in the VFS code:
>
> http://svnweb.freebsd.org/base?view=revision&revision=244795
>
> On Tue, Jan 15, 2013 at 12:55 PM, olivier  wrote:
> > Dear All,
> > Still experiencing the same hangs I reported earlier with 9.1. I've been
> > running a kernel with WITNESS enabled to provide more information.
> >
> > During an occurrence of the hang, running show alllocks gave
> >
> > Process 25777 (sysctl) thread 0xfe014c5b2920 (102567)
> > exclusive sleep mutex Giant (Giant) r = 0 (0x811e34c0) locked @
> > /usr/src/sys/dev/usb/usb_transfer.c:3171
> > Process 25750 (sshd) thread 0xfe015a688000 (104313)
> > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfe0204e0bb98) locked @
> > /usr/src/sys/kern/uipc_sockbuf.c:148
> > Process 24922 (cnid_dbd) thread 0xfe0187ac4920 (103597)
> > shared lockmgr zfs (zfs) r = 0 (0xfe0973062488) locked @
> > /usr/src/sys/kern/vfs_syscalls.c:3591
> > Process 24117 (sshd) thread 0xfe07bd914490 (104195)
> > exclusive sx so_rcv_sx (so_rcv_sx) r = 0 (0xfe0204e0a8f0) locked @
> > /usr/src/sys/kern/uipc_sockbuf.c:148
> > Process 1243 (java) thread 0xfe01ca85d000 (102704)
> > exclusive sleep mutex pmap (pmap) r = 0 (0xfe015aec1440) locked @
> > /usr/src/sys/amd64/amd64/pmap.c:4840
> > exclusive rw pmap pv global (pmap pv global) r = 0 (0x81409780)
> > locked @ /usr/src/sys/amd64/amd64/pmap.c:4802
> > exclusive sleep mutex vm page (vm page) r = 0 (0x813f0a80)
> locked @
> > /usr/src/sys/vm/vm_object.c:1128
> > exclusive sleep mutex vm object (standard object) r = 0
> > (0xfe01458e43a0) locked @ /usr/src/sys/vm/vm_object.c:1076
> > shared sx vm map (user) (vm map (user)) r = 0 (0xfe015aec1388)
> locked @
> > /usr/src/sys/vm/vm_map.c:2045
> > Process 994 (nfsd) thread 0xfe015a0df000 (102426)
> > shared lockmgr zfs (zfs) r = 0 (0xfe0c3b505878) locked @
> >
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760
> > Process 994 (nfsd) thread 0xfe015a0f8490 (102422)
> > exclusive lockmgr zfs (zfs) r = 0 (0xfe02db3b3e60) locked @
> >
> /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vfsops.c:1760
> > Process 931 (syslogd) thread 0xfe015af18920 (102365)
> > shared lockmgr zfs (zfs) r = 0 (0xfe0141dd6680) locked @
> > /usr/src/sys/kern/vfs_syscalls.c:3591
> > Process 22 (syncer) thread 0xfe0125077000 (100279)
> > exclusive lockmgr syncer (syncer) r = 0 (0xfe015a2ff680) locked @
> > /usr/src/sys/kern/vfs_subr.c:1809
> >
> > I don't have full "show lockedvnods" output because the output does not
> get
> > captured by ddb after using "capture on", it doesn't fit on a single
> > screen, and doesn't get piped into a "more" equivalent. What I did manage
> > to get (copied by hand, typos possible) is:
> >
> > 0xfe0c3b5057e0: 0xfe0c3b5057e0: tag zfs, type VREG
> > tag zfs, type VREG
> > usecount 1, writecount 0, refcount 1 mountedhere 0
> > usecount 1, writecount 0, refcount 1 mountedhere 0
> > flags (VI_ACTIVE)
> > flags (VI_ACTIVE)
> > v_object 0xfe089bc1b828 ref 0 pages 0
> > v_object 0xfe089bc1b828 ref 0 pages 0
> > lock type zfs: SHARED (count 1)
> > lock type zfs: SHARED (count 1)
> >
> > 0xfe02db3b3dc8: 0xfe02db3b3dc8: tag zfs, type VREG
> > tag zfs, type VREG
> > usecount 6, writecount 0, refcount 6 mountedhere 0
> > usecount 6, writecount 0, refcount 6 mountedhere 0
> > flags (VI_ACTIVE)
> > flags (VI_ACTIVE)
> > v_object 0xfe0b79583ae0 ref 0 pages 0
> > v_object 0xfe0b79583ae0 ref 0 pages 0
> > lock type zfs: EXCL by thread 0xfe015a0f8490 (pid 994)
> > lock type zfs: EXCL by thread 0xfe015a0f8490 (pid 994)
> > with exclusive waiters pending
> > with exclusive waiters pending
> >
> > The output of show witness is at http://pastebin.com/eSRb3FEu
> >
> > The output of alltrace is at http://pastebin.com/X1LruNrf (a number of
> > threads are stuck in zio_wait, none I can find in zio_interrupt, and
> > according to gstat and disks eventually going to sleep all disk IO seems
> to
> > be stuck for good; I think Andriy explained earlier that these criteria
> > might indicate this is a ZFS hang).
> >
> > The output of show geom is at http://pastebin.com/6nwQbKr4
> >
> > The output of vmstat -i is at http://pastebin.com/9LcZ7Mi0 Interrupts
> are
> > occurring at a normal rate during the hang, as far as I can tell.
> >
> > Any help would be greatly appreciated.
> > Thanks
> > Olivier
> > PS: my kernel was compiled from 9-STABLE from December, with CAM and ahci
> > from 9.0 (in the hope it would fix the hangs I was experiencing in plain
> > 9-STABLE; obviously the hangs are still occurring). The rest