On Thu, 10 Jan 2013 20:29:43 -0800
John Darrah <xyl...@gmail.com> wrote:

> On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote:
> > On Thu, 3 Jan 2013 21:29:22 -0800
> > John Darrah <xyl...@gmail.com> wrote:
> > 
> > > On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote:
> > > > On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote:
> > > > > On Sat, 29 Dec 2012 01:24:36 +0100
> > > > > Ben Hutchings <b...@decadent.org.uk> wrote:
> > > > > 
> > > > > > On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote:
> > > > > > > On Sun, 23 Dec 2012 09:10:34 -0500
> > > > > > > Jeff Layton <jlay...@redhat.com> wrote:
> > > > > > [...]
> > > > > > > > I had a look at the code today and suspect that I know what the 
> > > > > > > > problem
> > > > > > > > is. When the kernel goes to send a request, it first signs it 
> > > > > > > > and then
> > > > > > > > bumps the sequence numbers that it tracks. If the request 
> > > > > > > > doesn't
> > > > > > > > actually make it out onto the wire, like when the task catches a
> > > > > > > > signal, those sequence numbers remain high even though the 
> > > > > > > > request
> > > > > > > > didn't go out.
> > > > > > > > 
> > > > > > > > Here's an untested patch that might help tell whether this is 
> > > > > > > > the
> > > > > > > > case. You may want to try it and see if it does. Note that this 
> > > > > > > > fix is
> > > > > > > > a bit of a kludge and is not suitable for merging!
> > > > > > > > 
> > > > > > > > A better fix would involve changing when the sequence number 
> > > > > > > > gets
> > > > > > > > bumped in the first place. If this patch seems to help things, 
> > > > > > > > then
> > > > > > > > I'll look at coding up that up.
> > > > > > [...]
> > > > > > > I was able to reproduce this, and I don't think the above patch 
> > > > > > > will
> > > > > > > fix it (at least not completely). The problem seems to be that 
> > > > > > > the NT
> > > > > > > cancel command is screwing up the sequence numbers. We'll have to 
> > > > > > > do
> > > > > > > some research to figure out why that's occurring.
> > > > > > 
> > > > > > Jeff, we got a bug report in Debian which seems to be the same 
> > > > > > problem:
> > > > > > <http://bugs.debian.org/695492>.  Please cc John Darrah and the bug
> > > > > > address as above.
> > > > > > 
> > > > > > Ben.
> > > > > > 
> > > > > 
> > > > > You may want to try this patch. It seems to fix the problem for me, 
> > > > > but
> > > > > I think there is probably some more work to do in this area.
> > > > > 
> > > > > http://www.spinics.net/lists/linux-cifs/msg07576.html
> > > > > 
> > > > 
> > > > John, you can test this patch by following instructions at
> > > > <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>.
> > > > 
> > > > Please reply-to-all to Jeff's message when you have a result.
> > > > 
> > > > Ben.
> > > > 
> > > 
> 
> OK... I built a 3.2.35 kernel with the patch to transport.c 
> and also a 3.7.1 with the patch to smb1ops.c and loaded them 
> into my wheezy VM. I tested both by starting commands to 
> frob the CIFS mounts and then typing a CTRL-C to kill the 
> command, and they were stable (at least 50 attempts using 
> each kernel with the CTRL-C fired at random times into the 
> running command).
> 
> But... now another issue affects both kernels. It seems that 
> after 10 to 15 minutes of non use, the mount hangs and the 
> command accessing the mount can only be killed with a 
> SIGKILL... but only sometimes. Sometimes only a reboot 
> would unwedge things.
> 
> It seems when the mount would hang, I would get the:
>   CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting...
> error except the 3.7 kernel reported 120 seconds instead of 
> the 300 seconds noted above.
> 

Interesting, I haven't noticed that issue, but I'll try to reproduce it
when I get a chance.

> Below is one of the kernel logs after I SIGKILL'd things... 
> it looks like I trigered a fault of some kind. Maybe it has 
> some meaning (this log only happened once).
> 

Hmmm...

Looks like a problem in the virtualbox code. Certainly doesn't appear
to be cifs-related. It seems like we saw something similar when all of
the lockless dcache stuff went upstream, so it may be that the vbox
stuff needs to be forward-ported to handle that correctly.

> -- john
> 
> 
> Jan  7 07:06:34 jax kernel: imklog 5.8.11, log source = /proc/kmsg started.
> Jan  7 07:06:34 jax kernel: [    0.000000] Initializing cgroup subsys cpuset
> Jan  7 07:06:34 jax kernel: [    0.000000] Initializing cgroup subsys cpu
> Jan  7 07:06:34 jax kernel: [    0.000000] Linux version 3.2.0-4-486 
> (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 
> Debian 3.2.35-2
> 
> <-a bunch removed->
> 
> Jan  7 08:30:31 jax kernel: [   17.072068] eth0: no IPv6 routers present
> Jan  7 08:31:17 jax kernel: [   63.273900] FS-Cache: Netfs 'cifs' registered 
> for caching
> Jan  7 08:31:17 jax kernel: [   63.304164] CIFS VFS: default security 
> mechanism requested.  The default security mechanism will be upgraded from 
> ntlm to ntlmv2 in kernel release 3.3
> Jan  7 08:51:20 jax kernel: [ 1266.602096] CIFS VFS: Server amifile01 has not 
> responded in 300 seconds. Reconnecting...
> Jan  7 08:51:20 jax kernel: [ 1266.602347] CIFS VFS: Server amifile02 has not 
> responded in 300 seconds. Reconnecting...
> Jan  7 09:06:57 jax kernel: [ 2203.298637] ------------[ cut here 
> ]------------
> Jan  7 09:06:57 jax kernel: [ 2203.298645] WARNING: at 
> /root/linux-3.2.35/fs/dcache.c:1291 d_set_d_op+0x24/0x85()
> Jan  7 09:06:57 jax kernel: [ 2203.298648] Hardware name: VirtualBox
> Jan  7 09:06:57 jax kernel: [ 2203.298651] Modules linked in: des_generic ecb 
> md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd 
> sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc 
> snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev 
> serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg 
> sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd 
> ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: 
> scsi_wait_scan]
> Jan  7 09:06:57 jax kernel: [ 2203.298678] Pid: 2657, comm: bash Tainted: G   
>         O 3.2.0-4-486 #1 Debian 3.2.35-2a~test
> Jan  7 09:06:57 jax kernel: [ 2203.298681] Call Trace:
> Jan  7 09:06:57 jax kernel: [ 2203.298686]  [<c1022a98>] ? 
> warn_slowpath_common+0x68/0x79
> Jan  7 09:06:57 jax kernel: [ 2203.298690]  [<c10b09c7>] ? 
> d_set_d_op+0x24/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298694]  [<c1022ab6>] ? 
> warn_slowpath_null+0xd/0x10
> Jan  7 09:06:57 jax kernel: [ 2203.298697]  [<c10b09c7>] ? 
> d_set_d_op+0x24/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298702]  [<f99a7f6f>] ? 
> sf_instantiate.isra.0+0x118/0x15c [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298707]  [<f99a80da>] ? 
> sf_create_aux+0x127/0x18b [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298711]  [<f99a7887>] ? 
> sf_lookup+0x80/0x1df [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298715]  [<c10abd24>] ? 
> vfs_create+0x53/0x6a
> Jan  7 09:06:57 jax kernel: [ 2203.298718]  [<c10ac61c>] ? do_last+0x216/0x512
> Jan  7 09:06:57 jax kernel: [ 2203.298722]  [<c10acbc0>] ? 
> path_openat+0xa2/0x298
> Jan  7 09:06:57 jax kernel: [ 2203.298725]  [<c10ace5f>] ? 
> do_filp_open+0x23/0x5c
> Jan  7 09:06:57 jax kernel: [ 2203.298729]  [<c109d2d5>] ? 
> kmem_cache_alloc+0x39/0x89
> Jan  7 09:06:57 jax kernel: [ 2203.298734]  [<c12816ad>] ? 
> _cond_resched+0x5/0x18
> Jan  7 09:06:57 jax kernel: [ 2203.298740]  [<c10a329f>] ? 
> do_sys_open+0x54/0xd2
> Jan  7 09:06:57 jax kernel: [ 2203.298744]  [<c10a333b>] ? sys_open+0x1e/0x23
> Jan  7 09:06:57 jax kernel: [ 2203.298748]  [<c128245c>] ? 
> syscall_call+0x7/0xb
> Jan  7 09:06:57 jax kernel: [ 2203.298751] ---[ end trace 5c4181d85183f288 
> ]---
> Jan  7 09:06:57 jax kernel: [ 2203.298754] ------------[ cut here 
> ]------------
> Jan  7 09:06:57 jax kernel: [ 2203.298757] WARNING: at 
> /root/linux-3.2.35/fs/dcache.c:1295 d_set_d_op+0x48/0x85()
> Jan  7 09:06:57 jax kernel: [ 2203.298760] Hardware name: VirtualBox
> Jan  7 09:06:57 jax kernel: [ 2203.298763] Modules linked in: des_generic ecb 
> md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd 
> sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc 
> snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev 
> serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg 
> sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd 
> ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: 
> scsi_wait_scan]
> Jan  7 09:06:57 jax kernel: [ 2203.298781] Pid: 2657, comm: bash Tainted: G   
>      W  O 3.2.0-4-486 #1 Debian 3.2.35-2a~test
> Jan  7 09:06:57 jax kernel: [ 2203.298784] Call Trace:
> Jan  7 09:06:57 jax kernel: [ 2203.298787]  [<c1022a98>] ? 
> warn_slowpath_common+0x68/0x79
> Jan  7 09:06:57 jax kernel: [ 2203.298791]  [<c10b09eb>] ? 
> d_set_d_op+0x48/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298794]  [<c1022ab6>] ? 
> warn_slowpath_null+0xd/0x10
> Jan  7 09:06:57 jax kernel: [ 2203.298798]  [<c10b09eb>] ? 
> d_set_d_op+0x48/0x85
> Jan  7 09:06:57 jax kernel: [ 2203.298802]  [<f99a7f6f>] ? 
> sf_instantiate.isra.0+0x118/0x15c [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298806]  [<f99a80da>] ? 
> sf_create_aux+0x127/0x18b [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298810]  [<f99a7887>] ? 
> sf_lookup+0x80/0x1df [vboxsf]
> Jan  7 09:06:57 jax kernel: [ 2203.298814]  [<c10abd24>] ? 
> vfs_create+0x53/0x6a
> Jan  7 09:06:57 jax kernel: [ 2203.298818]  [<c10ac61c>] ? do_last+0x216/0x512
> Jan  7 09:06:57 jax kernel: [ 2203.298821]  [<c10acbc0>] ? 
> path_openat+0xa2/0x298
> Jan  7 09:06:57 jax kernel: [ 2203.298824]  [<c10ace5f>] ? 
> do_filp_open+0x23/0x5c
> Jan  7 09:06:57 jax kernel: [ 2203.298828]  [<c109d2d5>] ? 
> kmem_cache_alloc+0x39/0x89
> Jan  7 09:06:57 jax kernel: [ 2203.298832]  [<c12816ad>] ? 
> _cond_resched+0x5/0x18
> Jan  7 09:06:57 jax kernel: [ 2203.298835]  [<c10a329f>] ? 
> do_sys_open+0x54/0xd2
> Jan  7 09:06:57 jax kernel: [ 2203.298839]  [<c10a333b>] ? sys_open+0x1e/0x23
> Jan  7 09:06:57 jax kernel: [ 2203.298843]  [<c128245c>] ? 
> syscall_call+0x7/0xb
> Jan  7 09:06:57 jax kernel: [ 2203.298846] ---[ end trace 5c4181d85183f289 
> ]---
> 
> <log-end>

-- 
Jeff Layton <jlay...@redhat.com>


-- 
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20130111082716.5dc81...@tlielax.poochiereds.net

Reply via email to