On Thu, 10 Jan 2013 20:29:43 -0800 John Darrah <xyl...@gmail.com> wrote:
> On Fri, Jan 04, 2013 at 07:09:33AM -0500, Jeff Layton wrote: > > On Thu, 3 Jan 2013 21:29:22 -0800 > > John Darrah <xyl...@gmail.com> wrote: > > > > > On Sat, Dec 29, 2012 at 12:26:07PM +0100, Ben Hutchings wrote: > > > > On Fri, 2012-12-28 at 22:01 -0500, Jeff Layton wrote: > > > > > On Sat, 29 Dec 2012 01:24:36 +0100 > > > > > Ben Hutchings <b...@decadent.org.uk> wrote: > > > > > > > > > > > On Mon, 2012-12-24 at 09:14 -0500, Jeff Layton wrote: > > > > > > > On Sun, 23 Dec 2012 09:10:34 -0500 > > > > > > > Jeff Layton <jlay...@redhat.com> wrote: > > > > > > [...] > > > > > > > > I had a look at the code today and suspect that I know what the > > > > > > > > problem > > > > > > > > is. When the kernel goes to send a request, it first signs it > > > > > > > > and then > > > > > > > > bumps the sequence numbers that it tracks. If the request > > > > > > > > doesn't > > > > > > > > actually make it out onto the wire, like when the task catches a > > > > > > > > signal, those sequence numbers remain high even though the > > > > > > > > request > > > > > > > > didn't go out. > > > > > > > > > > > > > > > > Here's an untested patch that might help tell whether this is > > > > > > > > the > > > > > > > > case. You may want to try it and see if it does. Note that this > > > > > > > > fix is > > > > > > > > a bit of a kludge and is not suitable for merging! > > > > > > > > > > > > > > > > A better fix would involve changing when the sequence number > > > > > > > > gets > > > > > > > > bumped in the first place. If this patch seems to help things, > > > > > > > > then > > > > > > > > I'll look at coding up that up. > > > > > > [...] > > > > > > > I was able to reproduce this, and I don't think the above patch > > > > > > > will > > > > > > > fix it (at least not completely). The problem seems to be that > > > > > > > the NT > > > > > > > cancel command is screwing up the sequence numbers. We'll have to > > > > > > > do > > > > > > > some research to figure out why that's occurring. > > > > > > > > > > > > Jeff, we got a bug report in Debian which seems to be the same > > > > > > problem: > > > > > > <http://bugs.debian.org/695492>. Please cc John Darrah and the bug > > > > > > address as above. > > > > > > > > > > > > Ben. > > > > > > > > > > > > > > > > You may want to try this patch. It seems to fix the problem for me, > > > > > but > > > > > I think there is probably some more work to do in this area. > > > > > > > > > > http://www.spinics.net/lists/linux-cifs/msg07576.html > > > > > > > > > > > > > John, you can test this patch by following instructions at > > > > <http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-official>. > > > > > > > > Please reply-to-all to Jeff's message when you have a result. > > > > > > > > Ben. > > > > > > > > > OK... I built a 3.2.35 kernel with the patch to transport.c > and also a 3.7.1 with the patch to smb1ops.c and loaded them > into my wheezy VM. I tested both by starting commands to > frob the CIFS mounts and then typing a CTRL-C to kill the > command, and they were stable (at least 50 attempts using > each kernel with the CTRL-C fired at random times into the > running command). > > But... now another issue affects both kernels. It seems that > after 10 to 15 minutes of non use, the mount hangs and the > command accessing the mount can only be killed with a > SIGKILL... but only sometimes. Sometimes only a reboot > would unwedge things. > > It seems when the mount would hang, I would get the: > CIFS VFS: Server amifile01 has not responded in 300 seconds. Reconnecting... > error except the 3.7 kernel reported 120 seconds instead of > the 300 seconds noted above. > Interesting, I haven't noticed that issue, but I'll try to reproduce it when I get a chance. > Below is one of the kernel logs after I SIGKILL'd things... > it looks like I trigered a fault of some kind. Maybe it has > some meaning (this log only happened once). > Hmmm... Looks like a problem in the virtualbox code. Certainly doesn't appear to be cifs-related. It seems like we saw something similar when all of the lockless dcache stuff went upstream, so it may be that the vbox stuff needs to be forward-ported to handle that correctly. > -- john > > > Jan 7 07:06:34 jax kernel: imklog 5.8.11, log source = /proc/kmsg started. > Jan 7 07:06:34 jax kernel: [ 0.000000] Initializing cgroup subsys cpuset > Jan 7 07:06:34 jax kernel: [ 0.000000] Initializing cgroup subsys cpu > Jan 7 07:06:34 jax kernel: [ 0.000000] Linux version 3.2.0-4-486 > (debian-kernel@lists.debian.org) (gcc version 4.6.3 (Debian 4.6.3-14) ) #1 > Debian 3.2.35-2 > > <-a bunch removed-> > > Jan 7 08:30:31 jax kernel: [ 17.072068] eth0: no IPv6 routers present > Jan 7 08:31:17 jax kernel: [ 63.273900] FS-Cache: Netfs 'cifs' registered > for caching > Jan 7 08:31:17 jax kernel: [ 63.304164] CIFS VFS: default security > mechanism requested. The default security mechanism will be upgraded from > ntlm to ntlmv2 in kernel release 3.3 > Jan 7 08:51:20 jax kernel: [ 1266.602096] CIFS VFS: Server amifile01 has not > responded in 300 seconds. Reconnecting... > Jan 7 08:51:20 jax kernel: [ 1266.602347] CIFS VFS: Server amifile02 has not > responded in 300 seconds. Reconnecting... > Jan 7 09:06:57 jax kernel: [ 2203.298637] ------------[ cut here > ]------------ > Jan 7 09:06:57 jax kernel: [ 2203.298645] WARNING: at > /root/linux-3.2.35/fs/dcache.c:1291 d_set_d_op+0x24/0x85() > Jan 7 09:06:57 jax kernel: [ 2203.298648] Hardware name: VirtualBox > Jan 7 09:06:57 jax kernel: [ 2203.298651] Modules linked in: des_generic ecb > md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd > sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc > snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev > serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg > sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd > ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: > scsi_wait_scan] > Jan 7 09:06:57 jax kernel: [ 2203.298678] Pid: 2657, comm: bash Tainted: G > O 3.2.0-4-486 #1 Debian 3.2.35-2a~test > Jan 7 09:06:57 jax kernel: [ 2203.298681] Call Trace: > Jan 7 09:06:57 jax kernel: [ 2203.298686] [<c1022a98>] ? > warn_slowpath_common+0x68/0x79 > Jan 7 09:06:57 jax kernel: [ 2203.298690] [<c10b09c7>] ? > d_set_d_op+0x24/0x85 > Jan 7 09:06:57 jax kernel: [ 2203.298694] [<c1022ab6>] ? > warn_slowpath_null+0xd/0x10 > Jan 7 09:06:57 jax kernel: [ 2203.298697] [<c10b09c7>] ? > d_set_d_op+0x24/0x85 > Jan 7 09:06:57 jax kernel: [ 2203.298702] [<f99a7f6f>] ? > sf_instantiate.isra.0+0x118/0x15c [vboxsf] > Jan 7 09:06:57 jax kernel: [ 2203.298707] [<f99a80da>] ? > sf_create_aux+0x127/0x18b [vboxsf] > Jan 7 09:06:57 jax kernel: [ 2203.298711] [<f99a7887>] ? > sf_lookup+0x80/0x1df [vboxsf] > Jan 7 09:06:57 jax kernel: [ 2203.298715] [<c10abd24>] ? > vfs_create+0x53/0x6a > Jan 7 09:06:57 jax kernel: [ 2203.298718] [<c10ac61c>] ? do_last+0x216/0x512 > Jan 7 09:06:57 jax kernel: [ 2203.298722] [<c10acbc0>] ? > path_openat+0xa2/0x298 > Jan 7 09:06:57 jax kernel: [ 2203.298725] [<c10ace5f>] ? > do_filp_open+0x23/0x5c > Jan 7 09:06:57 jax kernel: [ 2203.298729] [<c109d2d5>] ? > kmem_cache_alloc+0x39/0x89 > Jan 7 09:06:57 jax kernel: [ 2203.298734] [<c12816ad>] ? > _cond_resched+0x5/0x18 > Jan 7 09:06:57 jax kernel: [ 2203.298740] [<c10a329f>] ? > do_sys_open+0x54/0xd2 > Jan 7 09:06:57 jax kernel: [ 2203.298744] [<c10a333b>] ? sys_open+0x1e/0x23 > Jan 7 09:06:57 jax kernel: [ 2203.298748] [<c128245c>] ? > syscall_call+0x7/0xb > Jan 7 09:06:57 jax kernel: [ 2203.298751] ---[ end trace 5c4181d85183f288 > ]--- > Jan 7 09:06:57 jax kernel: [ 2203.298754] ------------[ cut here > ]------------ > Jan 7 09:06:57 jax kernel: [ 2203.298757] WARNING: at > /root/linux-3.2.35/fs/dcache.c:1295 d_set_d_op+0x48/0x85() > Jan 7 09:06:57 jax kernel: [ 2203.298760] Hardware name: VirtualBox > Jan 7 09:06:57 jax kernel: [ 2203.298763] Modules linked in: des_generic ecb > md4 hmac nls_utf8 cifs vboxsf(O) nfsd nfs nfs_acl auth_rpcgss fscache lockd > sunrpc loop snd_intel8x0 snd_ac97_codec snd_pcsp snd_pcm snd_page_alloc > snd_timer psmouse joydev parport_pc parport usbhid snd hid vboxguest(O) evdev > serio_raw battery ac ac97_bus soundcore button ext4 crc16 jbd2 mbcache sg > sr_mod sd_mod crc_t10dif cdrom ata_generic ata_piix ahci libahci ohci_hcd > ehci_hcd libata scsi_mod e1000 usbcore usb_common [last unloaded: > scsi_wait_scan] > Jan 7 09:06:57 jax kernel: [ 2203.298781] Pid: 2657, comm: bash Tainted: G > W O 3.2.0-4-486 #1 Debian 3.2.35-2a~test > Jan 7 09:06:57 jax kernel: [ 2203.298784] Call Trace: > Jan 7 09:06:57 jax kernel: [ 2203.298787] [<c1022a98>] ? > warn_slowpath_common+0x68/0x79 > Jan 7 09:06:57 jax kernel: [ 2203.298791] [<c10b09eb>] ? > d_set_d_op+0x48/0x85 > Jan 7 09:06:57 jax kernel: [ 2203.298794] [<c1022ab6>] ? > warn_slowpath_null+0xd/0x10 > Jan 7 09:06:57 jax kernel: [ 2203.298798] [<c10b09eb>] ? > d_set_d_op+0x48/0x85 > Jan 7 09:06:57 jax kernel: [ 2203.298802] [<f99a7f6f>] ? > sf_instantiate.isra.0+0x118/0x15c [vboxsf] > Jan 7 09:06:57 jax kernel: [ 2203.298806] [<f99a80da>] ? > sf_create_aux+0x127/0x18b [vboxsf] > Jan 7 09:06:57 jax kernel: [ 2203.298810] [<f99a7887>] ? > sf_lookup+0x80/0x1df [vboxsf] > Jan 7 09:06:57 jax kernel: [ 2203.298814] [<c10abd24>] ? > vfs_create+0x53/0x6a > Jan 7 09:06:57 jax kernel: [ 2203.298818] [<c10ac61c>] ? do_last+0x216/0x512 > Jan 7 09:06:57 jax kernel: [ 2203.298821] [<c10acbc0>] ? > path_openat+0xa2/0x298 > Jan 7 09:06:57 jax kernel: [ 2203.298824] [<c10ace5f>] ? > do_filp_open+0x23/0x5c > Jan 7 09:06:57 jax kernel: [ 2203.298828] [<c109d2d5>] ? > kmem_cache_alloc+0x39/0x89 > Jan 7 09:06:57 jax kernel: [ 2203.298832] [<c12816ad>] ? > _cond_resched+0x5/0x18 > Jan 7 09:06:57 jax kernel: [ 2203.298835] [<c10a329f>] ? > do_sys_open+0x54/0xd2 > Jan 7 09:06:57 jax kernel: [ 2203.298839] [<c10a333b>] ? sys_open+0x1e/0x23 > Jan 7 09:06:57 jax kernel: [ 2203.298843] [<c128245c>] ? > syscall_call+0x7/0xb > Jan 7 09:06:57 jax kernel: [ 2203.298846] ---[ end trace 5c4181d85183f289 > ]--- > > <log-end> -- Jeff Layton <jlay...@redhat.com> -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20130111082716.5dc81...@tlielax.poochiereds.net