https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=219224
Bug ID: 219224 Summary: umount hanging in _cv_wait zil_commit zfs_sync sys_sync Product: Base System Version: 11.0-RELEASE Hardware: amd64 OS: Any Status: New Severity: Affects Only Me Priority: --- Component: kern Assignee: freebsd-bugs@FreeBSD.org Reporter: t...@sdf.org I'm observing hanging processes on a zfs system, apparently after umount tried to sync data before unmounting a jails devfs. tools-1-b# service jail restart git-1-b Stopping jails: git-1-b load: 1.77 cmd: umount 79620 [zilog->zl_cv_batch[0]] 568.08r 0.00u 0.01s 0% 2000k load: 1.65 cmd: umount 79620 [zilog->zl_cv_batch[0]] 575.16r 0.00u 0.01s 0% 2000k load: 2.32 cmd: umount 79620 [zilog->zl_cv_batch[0]] 580.44r 0.00u 0.01s 0% 2000k tools-1-b# procstat -kk 79620 PID TID COMM TDNAME KSTACK 79620 100216 umount - mi_switch+0xd2 sleepq_wait+0x3a _cv_wait+0x194 zil_commit+0x85 zfs_sync+0xad sys_sync+0x146 amd64_syscall+0x4ce Xfast_syscall+0xfb tools-1-b# kgdb GNU gdb 6.1.1 [FreeBSD] Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "amd64-marcel-freebsd"...(no debugging symbols found) ... Attempt to extract a component of a value that is not a structure pointer. Attempt to extract a component of a value that is not a structure pointer. #0 0xffffffff80b0cafb in sched_switch () (kgdb) tid 100216 [Switching to thread 2772 (Thread 100216)]#0 0xffffffff80b0cafb in sched_switch () (kgdb) bt #0 0xffffffff80b0cafb in sched_switch () #1 0xffffffff80ae4ae2 in mi_switch () #2 0xffffffff80b3226a in sleepq_wait () #3 0xffffffff80a66464 in _cv_wait () #4 0xffffffff82252945 in ?? () #5 0xfffffe201d5308e0 in ?? () #6 0xfffff805d1af7500 in ?? () #7 0xfffffe201d530870 in ?? () #8 0xffffffff80bae521 in __mnt_vnode_markerfree_all () #9 0xffffffff8227aded in ?? () #10 0x00000000fffff7ff in ?? () #11 0xfffffe201d530968 in ?? () #12 0xfffffe201d530990 in ?? () #13 0xffffffff80bb48a6 in sys_sync () Previous frame identical to this frame (corrupt stack?) (kgdb) Two other processes are hanging too: tools-1-b# ps ax | awk '$3~"D"&&$5~"^/"' 36031 - DsJ 0:00.49 /var/cfengine/bin/cf-agent -Dfrom_cfexecd,scheduled_ 40395 - DJ 0:00.01 /var/cfengine/bin/cf-promises -c /var/cfengine/input 79620 0 D+ 0:00.01 /sbin/umount /jails/git-1-b/dev tools-1-b# procstat -kk 36031 PID TID COMM TDNAME KSTACK 36031 103064 cf-agent - mi_switch+0xd2 sleepq_wait+0x3a _cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d tools-1-b# procstat -kk 40395 PID TID COMM TDNAME KSTACK 40395 103070 cf-promises - mi_switch+0xd2 sleepq_wait+0x3a _cv_wait+0x194 zil_commit+0x85 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d Everything is on zfs, with both the mothership and the jail in zroot/jails/cfengine-1-b sharing pool "zroot": tools-1-b# zpool list NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT data 2.91T 1.43M 2.91T - 0% 0% 2.14x ONLINE - zroot 744G 19.6G 724G - 19% 2% 1.60x ONLINE - I observe another process being reported at 100% cpu usage, that's highly unusual, also inside the cfengine-jail: tools-1-b# ps w 2988 PID TT STAT TIME COMMAND 2988 - TsJ 118:40.91 /usr/local/sbin/cf-serverd tools-1-b# procstat -kk 2988 PID TID COMM TDNAME KSTACK 2988 100835 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100922 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100963 cf-serverd - dmu_object_info_from_db+0x2d zfs_zget+0xde zfs_get_data+0x40 zil_commit+0x6a1 zfs_freebsd_putpages+0x71e VOP_PUTPAGES_APV+0x8d vnode_pager_putpages+0xe3 vm_pageout_flush+0xed vm_object_page_collect_flush+0x21a vm_object_page_clean+0x197 vm_object_terminate+0x99 zfs_freebsd_reclaim+0x1e VOP_RECLAIM_APV+0x8f vgonel+0x2ef vrecycle+0x90 zfs_freebsd_inactive+0xd VOP_INACTIVE_APV+0x8d vinactive+0x17d 2988 100966 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100970 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f 2988 100973 cf-serverd - mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f [ ... 1000 more threads with exactly "mi_switch+0xd2 thread_suspend_check+0x34a ast+0x5be doreti_ast+0x1f"] This thread 100963 seems to be looping around: tools-1-b# {repeat 20 procstat -k 2988} | awk '$2=="100963"' | cut -d- -f3- | sort | uniq dbuf_hold_impl dbuf_hold dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle dbuf_read dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive dmu_object_info_from_db zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive dmu_object_info_from_dnode dmu_object_info_from_db zfs_zget zfsget_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV dmu_zfetch dbuf_read dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle dnode_hold_impl dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_o bject_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV dnode_rele dmu_bonus_hold zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV zfs_zget zfs_get_data zil_commit zfs_freebsd_putpages VOP_PUTPAGES_APV vnode_pager_putpages vm_pageout_flush vm_object_page_collect_flush vm_object_page_clean vm_object_terminate zfs_freebsd_reclaim VOP_RECLAIM_APV vgonel vrecycle zfs_freebsd_inactive VOP_INACTIVE_APV vinactive vputx Only the above listed processes are affected, there are ~200 other ones running (and writing to zroot pool). -- You are receiving this mail because: You are the assignee for the bug. _______________________________________________ freebsd-bugs@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-bugs To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"