Hi List, I can confirm that it is the bug you mentioned steven. Here is how I found it.
I recorded hourly zfskern and nfsd stats. like this. echo "PROCSTAT" >> $reportname pgrep -S "(zfskern|nfsd)" | xargs procstat -kk >> $reportname luckily it crashed this night and logged this. 1910 101508 nfsd nfsd: service mi_switch+0x186 sleepq_wait+0x42 _sleep+0x376 arc_lowmem+0x77 kmem_malloc+0xc1 uma_large_malloc+0x4a malloc+0xd9 arc_get_data_buf+0xb5 arc_read_nolock+0x1ec arc_read+0x93 dbuf_prefetch+0x12c dmu_zfetch_dofetch+0x10b dmu_zfetch+0xaf8 dbuf_read+0x4a7 dmu_buf_hold_array_by_dnode+0x16b dmu_buf_hold_array+0x67 dmu_read_uio+0x3f zfs_freebsd_read+0x3e3 Maybe it would be good to merge this fix into RELENG_9_1 and distribute a fix via freebsd-update what do you think? best, -dennis Am 16.05.2013 um 11:42 schrieb dennis berger: > This is indeed a ZFS+NFS system and I can see that istgt and nfs are stuck in > some ZIO state. Maybe it's this. > Thank's for pointing out. > > Is it this ZFS+NFS deadlock? > > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c > @@ -3720,8 +3720,16 @@ arc_lowmem(void *arg __unused, int howto __unused) > mutex_enter(&arc_reclaim_thr_lock); > needfree = 1; > cv_signal(&arc_reclaim_thr_cv); > - while (needfree) > - msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > + > + /* > + * It is unsafe to block here in arbitrary threads, because we can come > + * here from ARC itself and may hold ARC locks and thus risk a deadlock > + * with ARC reclaim thread. > + */ > + if (curproc == pageproc) { > + while (needfree) > + msleep(&needfree, &arc_reclaim_thr_lock, 0, "zfs:lowmem", 0); > + } > mutex_exit(&arc_reclaim_thr_lock); > mutex_exit(&arc_lowmem_lock); > } > > I'll try to crash our testsystem. I'll assume that stressing NFS backed with > ZFS a lot might trigger this bug? > > -dennis > > > Am 16.05.2013 um 00:03 schrieb Steven Hartland: > >> ----- Original Message ----- From: "dennis berger" <d...@nipsi.de> >>> FreeBSD 9.1-RELEASE FreeBSD 9.1-RELEASE #0 r243825: Tue Dec 4 09:23:10 >>> UTC 2012 >>> >>>> 3. Regarding this: >>>>>> A clean shutdown isn't possible though. It hangs after vnode >>>>>> cleaning, normally you would see detaching of usb devices here, or >>>>>> other devices maybe? >>>> Please don't conflate this with your above issue. This is almost >>>> certainly unrelated. Please start a new thread about that if desired. >>> >>> Maybe this is a misunderstanding normally this system will shutdown >>> cleanly, of course. >>> This hang only appears after the network problem above. >> >> If this is a ZFS system, its a known issue which is fixed in current, >> stable-9, stable-8 and the upcoming 8.4 release. >> >> If not and you have USB devices see if the following sysctl helps: >> hw.usb.no_shutdown_wait=1 >> >> Regards >> Steve >> >> ================================================ >> This e.mail is private and confidential between Multiplay (UK) Ltd. and the >> person or entity to whom it is addressed. In the event of misdirection, the >> recipient is prohibited from using, copying, printing or otherwise >> disseminating it or any information contained in it. >> In the event of misdirection, illegible or incomplete transmission please >> telephone +44 845 868 1337 >> or return the E.mail to postmas...@multiplay.co.uk. >> >> _______________________________________________ >> freebsd-stable@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-stable >> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" > > > > _______________________________________________ > freebsd-stable@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-stable > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" _______________________________________________ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"