Thanks, I have filed the bug but I dont know how to provide the crash dump. If this bug is accepted, RE can get the crash dump file from me.
On Thu, Jul 15, 2010 at 10:33 PM, George Wilson <george.r.wil...@oracle.com> wrote: > I don't recall seeing this issue before. Best thing to do is file a bug and > include a pointer to the crash dump. > > - George > > zhihui Chen wrote: >> >> Looks that the txg_sync_thread for this pool has been blocked and >> never return, which leads to many other threads have been >> blocked. I have tried to change zfs_vdev_max_pending value from 10 to >> 35 and retested the workload serveral times, this issue >> does not happen. But if I change it back to 10, it happens very >> easily. Any known bug on this or any suggestion to solve this issue? >> >>> ffffff0502c3378c::wchaninfo -v >> >> ADDR TYPE NWAITERS THREAD PROC >> ffffff0502c3378c cond 1730: ffffff051cc6b500 go_filebench >> ffffff051ce61020 go_filebench >> ffffff051cc4e4e0 go_filebench >> ffffff051d115120 go_filebench >> ffffff051e9ed000 go_filebench >> ffffff051bf644c0 go_filebench >> ffffff051c65b000 go_filebench >> ffffff051c728500 go_filebench >> ffffff050d83a8c0 go_filebench >> ffffff051c528c00 go_filebench >> ffffff051b750800 go_filebench >> ffffff051cdd7520 go_filebench >> ffffff051ce71bc0 go_filebench >> ffffff051cb5e840 go_filebench >> ffffff051cbdec60 go_filebench >> ffffff0516473c60 go_filebench >> ffffff051d132820 go_filebench >> ffffff051d13a400 go_filebench >> ffffff050fbf0b40 go_filebench >> ffffff051ce7a400 go_filebench >> ffffff051b781820 go_filebench >> ffffff051ce603e0 go_filebench >> ffffff051d1bf840 go_filebench >> ffffff051c6c24c0 go_filebench >> ffffff051d204100 go_filebench >> ffffff051cbdf160 go_filebench >> ffffff051ce52c00 go_filebench >> ....... >>> >>> ffffff051cc6b500::findstack -v >> >> stack pointer for thread ffffff051cc6b500: ffffff0020a76ac0 >> [ ffffff0020a76ac0 _resume_from_idle+0xf1() ] >> ffffff0020a76af0 swtch+0x145() >> ffffff0020a76b20 cv_wait+0x61(ffffff0502c3378c, ffffff0502c33700) >> ffffff0020a76b70 zil_commit+0x67(ffffff0502c33700, 6b255, 14) >> ffffff0020a76d80 zfs_write+0xaaf(ffffff050b5c9140, ffffff0020a76e40, >> 40, ffffff0502dab258, 0) >> ffffff0020a76df0 fop_write+0x6b(ffffff050b5c9140, ffffff0020a76e40, >> 40, ffffff0502dab258, 0) >> ffffff0020a76ec0 pwrite64+0x244(1a, b6f2a000, 800, b841a800, 0) >> ffffff0020a76f10 sys_syscall32+0xff() >> >> From the zil_commit code, I try to find the thread whose stack have >> function call zil_commit_writer. This thread did not >> return back from zil_commit_write so that it will not call >> cv_broadcast to wake up the waiting threads. >> >>> ffffff051d10fba0::findstack -v >> >> stack pointer for thread ffffff051d10fba0: ffffff0021ab9a10 >> [ ffffff0021ab9a10 _resume_from_idle+0xf1() ] >> ffffff0021ab9a40 swtch+0x145() >> ffffff0021ab9a70 cv_wait+0x61(ffffff051ae1b988, ffffff051ae1b980) >> ffffff0021ab9ab0 zio_wait+0x5d(ffffff051ae1b680) >> ffffff0021ab9b20 zil_commit_writer+0x249(ffffff0502c33700, 6b250, e) >> ffffff0021ab9b70 zil_commit+0x91(ffffff0502c33700, 6b250, e) >> ffffff0021ab9d80 zfs_write+0xaaf(ffffff050b5c9540, ffffff0021ab9e40, >> 40, ffffff0502dab258, 0) >> ffffff0021ab9df0 fop_write+0x6b(ffffff050b5c9540, ffffff0021ab9e40, >> 40, ffffff0502dab258, 0) >> ffffff0021ab9ec0 pwrite64+0x244(14, bfbfb800, 800, 88f3f000, 0) >> ffffff0021ab9f10 sys_syscall32+0xff() >> >>> ffffff051ae1b680::zio -r >> >> ADDRESS TYPE STAGE WAITER >> ffffff051ae1b680 NULL CHECKSUM_VERIFY >> ffffff051d10fba0 >> ffffff051a9c1978 WRITE VDEV_IO_START - >> ffffff052454d348 WRITE VDEV_IO_START - >> ffffff051572b960 WRITE VDEV_IO_START - >> ffffff050accb330 WRITE VDEV_IO_START - >> ffffff0514453c80 WRITE VDEV_IO_START - >> ffffff0524537648 WRITE VDEV_IO_START - >> ffffff05090e9660 WRITE VDEV_IO_START - >> ffffff05151cb698 WRITE VDEV_IO_START - >> ffffff0514668658 WRITE VDEV_IO_START - >> ffffff0514835690 WRITE VDEV_IO_START - >> ffffff05198979a0 WRITE VDEV_IO_START - >> ffffff0507e1d038 WRITE VDEV_IO_START - >> ffffff0510727028 WRITE VDEV_IO_START - >> ffffff0523a25018 WRITE VDEV_IO_START - >> ffffff0523d729c0 WRITE VDEV_IO_START - >> ffffff052465b990 WRITE VDEV_IO_START - >> ffffff052395f008 WRITE DONE - >> ffffff0514cbc350 WRITE VDEV_IO_START - >> ffffff05146f2688 WRITE VDEV_IO_START - >> ffffff0509454048 WRITE VDEV_IO_START - >> ffffff0524186038 WRITE VDEV_IO_START - >> ffffff051166e9a0 WRITE DONE - >> ffffff0515256960 WRITE VDEV_IO_START - >> ffffff0518edf010 WRITE VDEV_IO_START - >> ffffff0514b2f688 WRITE VDEV_IO_START - >> ffffff05158b4040 WRITE VDEV_IO_START - >> ffffff052448d648 WRITE DONE - >> ffffff0512354380 WRITE VDEV_IO_START - >> ffffff051aafe6a0 WRITE VDEV_IO_START - >> ffffff051524e350 WRITE VDEV_IO_START - >> ffffff051a707058 WRITE VDEV_IO_START - >> ffffff0524679c88 WRITE DONE - >> ffffff051acef058 WRITE DONE - >> >>> ffffff051acef058::print zio_t io_executor >> >> io_executor = 0xffffff002089ac40 >>> >>> 0xffffff002089ac40::findstack -v >> >> stack pointer for thread ffffff002089ac40: ffffff002089a720 >> [ ffffff002089a720 _resume_from_idle+0xf1() ] >> ffffff002089a750 swtch+0x145() >> ffffff002089a800 turnstile_block+0x760(ffffff051d186418, 0, >> ffffff051fcf0340, fffffffffbc07db8, 0, 0) >> ffffff002089a860 mutex_vector_enter+0x261(ffffff051fcf0340) >> ffffff002089a890 txg_rele_to_sync+0x2a(ffffff05121bece8) >> ffffff002089a8c0 dmu_tx_commit+0xee(ffffff05121bec98) >> ffffff002089a8f0 zil_lwb_write_done+0x5f(ffffff051acef058) >> ffffff002089a960 zio_done+0x383(ffffff051acef058) >> ffffff002089a990 zio_execute+0x8d(ffffff051acef058) >> ffffff002089a9f0 zio_notify_parent+0xa6(ffffff051acef058, >> ffffff052391b9b8, 1) >> ffffff002089aa60 zio_done+0x3e2(ffffff052391b9b8) >> ffffff002089aa90 zio_execute+0x8d(ffffff052391b9b8) >> ffffff002089ab30 taskq_thread+0x248(ffffff050c418910) >> ffffff002089ab40 thread_start+8() >>> >>> ffffff05121bece8::print -t txg_handle_t >> >> txg_handle_t { >> tx_cpu_t *th_cpu = 0xffffff051fcf0340 >> uint64_t th_txg = 0xf36 >> } >> >> >>> ffffff051fcf0340::mutex >> >> ADDR TYPE HELD MINSPL OLDSPL WAITERS >> ffffff051fcf0340 adapt ffffff050dc5d3a0 - - yes >> >>> ffffff050dc5d3a0::findstack -v >> >> stack pointer for thread ffffff050dc5d3a0: ffffff0023589970 >> [ ffffff0023589970 _resume_from_idle+0xf1() ] >> ffffff00235899a0 swtch+0x145() >> ffffff0023589a50 turnstile_block+0x760(ffffff051ce0c948, 0, >> ffffff05083403c8, fffffffffbc07db8, 0, 0) >> ffffff0023589ab0 mutex_vector_enter+0x261(ffffff05083403c8) >> ffffff0023589b30 dmu_tx_try_assign+0xab(ffffff0514395018, 2) >> ffffff0023589b70 dmu_tx_assign+0x2a(ffffff0514395018, 2) >> ffffff0023589d80 zfs_write+0x65f(ffffff050b5c9640, ffffff0023589e40, >> 40, ffffff0502dab258, 0) >> ffffff0023589df0 fop_write+0x6b(ffffff050b5c9640, ffffff0023589e40, >> 40, ffffff0502dab258, 0) >> ffffff0023589ec0 pwrite64+0x244(16, b6f7c000, 800, a7ef7800, 0) >> ffffff0023589f10 sys_syscall32+0xff() >>> >>> ffffff0514395018::print dmu_tx_t >> >> { >> tx_holds = { >> list_size = 0x50 >> list_offset = 0x8 >> list_head = { >> list_next = 0xffffff0508054840 >> list_prev = 0xffffff050da3b1f8 >> } >> } >> tx_objset = 0xffffff05028c8940 >> tx_dir = 0xffffff04e7785400 >> tx_pool = 0xffffff0502ceac00 >> tx_txg = 0xf36 >> tx_lastsnap_txg = 0x1 >> tx_lasttried_txg = 0 >> tx_txgh = { >> th_cpu = 0xffffff051fcf0340 >> th_txg = 0xf36 >> } >> tx_tempreserve_cookie = 0 >> tx_needassign_txh = 0 >> tx_callbacks = { >> list_size = 0x20 >> list_offset = 0 >> list_head = { >> list_next = 0xffffff0514395098 >> list_prev = 0xffffff0514395098 >> } >> } >> tx_anyobj = 0 >> tx_err = 0 >> } >>> >>> ffffff05083403c8::mutex >> >> ADDR TYPE HELD MINSPL OLDSPL WAITERS >> ffffff05083403c8 adapt ffffff002035cc40 - - yes >> >>> ffffff002035cc40::findstack -v >> >> stack pointer for thread ffffff002035cc40: ffffff002035c590 >> [ ffffff002035c590 _resume_from_idle+0xf1() ] >> ffffff002035c5c0 swtch+0x145() >> ffffff002035c5f0 cv_wait+0x61(ffffff05123ce350, ffffff05123ce348) >> ffffff002035c630 zio_wait+0x5d(ffffff05123ce048) >> ffffff002035c690 dbuf_read+0x1e8(ffffff0509c758e0, 0, a) >> ffffff002035c710 dmu_buf_hold+0xac(ffffff05028c8940, >> ffffffffffffffff, 0, 0, ffffff002035c748, 1) >> ffffff002035c7b0 zap_lockdir+0x6d(ffffff05028c8940, >> ffffffffffffffff, 0, 1, 1, 0, ffffff002035c7d8) >> ffffff002035c840 zap_lookup_norm+0x55(ffffff05028c8940, >> ffffffffffffffff, ffffff002035c920, 8, 1, ffffff002035c8b8, 0, 0, 0 >> , 0) >> ffffff002035c8a0 zap_lookup+0x2d(ffffff05028c8940, ffffffffffffffff, >> ffffff002035c920, 8, 1, ffffff002035c8b8) >> ffffff002035c910 zap_increment+0x64(ffffff05028c8940, >> ffffffffffffffff, ffffff002035c920, fffffffeffef7e00, >> ffffff0511d9bc80) >> ffffff002035c990 zap_increment_int+0x68(ffffff05028c8940, >> ffffffffffffffff, 0, fffffffeffef7e00, ffffff0511d9bc80) >> ffffff002035c9f0 do_userquota_update+0x69(ffffff05028c8940, >> 100108000, 3, 0, 0, 1, ffffff0511d9bc80) >> ffffff002035ca50 >> dmu_objset_do_userquota_updates+0xde(ffffff05028c8940, >> ffffff0511d9bc80) >> ffffff002035cad0 dsl_pool_sync+0x112(ffffff0502ceac00, f34) >> ffffff002035cb80 spa_sync+0x37b(ffffff0501269580, f34) >> ffffff002035cc20 txg_sync_thread+0x247(ffffff0502ceac00) >> ffffff002035cc30 thread_start+8() >>> >>> ffffff05123ce048::zio -r >> >> ADDRESS TYPE STAGE WAITER >> ffffff05123ce048 NULL CHECKSUM_VERIFY >> ffffff002035cc40 >> ffffff051a9a9338 READ VDEV_IO_START - >> ffffff050e3a4050 READ VDEV_IO_DONE - >> ffffff0519173c90 READ VDEV_IO_START - >> >>> ffffff0519173c90::print zio_t io_done >> >> io_done = vdev_cache_fill >> >> The zio ffffff0519173c90 is vdec cach read rquest and can not be done >> so that txt_sync_thread isblocked. I dont know why this zio can not be >> satisfied and enter into done stage. I have tried to dd the raw device >> which consists the pool when this zfs hangs, it works ok. >> >> Thanks >> Zhihui >> >> On Mon, Jul 5, 2010 at 7:56 PM, zhihui Chen <zhch...@gmail.com> wrote: >>> >>> I tried to run "zfs list" on my system, but looks that this command >>> will hangs. This command can not return even if I press "contrl+c" as >>> following: >>> r...@intel7:/export/bench/io/filebench/results# zfs list >>> ^C^C^C^C >>> >>> ^C^C^C^C >>> >>> >>> >>> >>> .. >>> When this happens, I am running filebench benchmark with oltp >>> workload. But "zpool status" shows that all pools are in good statu >>> like following: >>> r...@intel7:~# zpool status >>> pool: rpool >>> state: ONLINE >>> status: The pool is formatted using an older on-disk format. The pool >>> can >>> still be used, but some features are unavailable. >>> action: Upgrade the pool using 'zpool upgrade'. Once this is done, the >>> pool will no longer be accessible on older software versions. >>> scan: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> rpool ONLINE 0 0 0 >>> c8t0d0s0 ONLINE 0 0 0 >>> >>> errors: No known data errors >>> >>> pool: tpool >>> state: ONLINE >>> scan: none requested >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> tpool ONLINE 0 0 0 >>> c10t1d0 ONLINE 0 0 0 >>> >>> errors: No known data errors >>> >>> >>> My system is running B141 and tpool is using the latest version 26. >>> Tried command "truss -p `pgrep zfs`", but it failes like following: >>> >>> r...@intel7:~# truss -p `pgrep zfs` >>> truss: unanticipated system error: 5060 >>> >>> Looks that zfs is in deadlock state, but I dont know what is the >>> cause. I have tried to run filebench/oltp workload several times, each >>> time it will leads to this state. But if I run filebench with other >>> workload such as fileserver, webwerver, this issue does not happen. >>> >>> Thanks >>> Zhihui >>> >> _______________________________________________ >> zfs-discuss mailing list >> zfs-discuss@opensolaris.org >> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss > > _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss