Dennis Clarke wrote:
CTRL+C does nothing and kill -9 pid does nothing to this command.
feels like a bug to me
Yes, it is:
http://bugs.opensolaris.org/view_bug.do?bug_id=6758902
Now I recall why I had to reboot. Seems as if a lot of commands hang now.
Things like :
df -ak
zfs list
pid of 'zfs list' is 1754
zpool list
pid of 'zpool list' is 1873
they all just hang.
Ok, here's a little bit more details:
pid of 'zpool import' is 1361.
> 0t1361::pid2proc|::walk thread|::findstack -v
stack pointer for thread ec1b8ce0: ec223c54
ec223c94 swtch+0x188()
ec223ca4 cv_wait+0x53(eb414966, eb414928, ffffffff, 0)
ec223ce4 txg_wait_synced+0x90(eb4147c0, 65a, 0, 0)
ec223d34 spa_config_update_common+0x88(ddbda380, 0, 0, ec223d68)
ec223d84 spa_import_common+0x3cf()
ec223db4 spa_import+0x18(ea841000, eb04f5e0, eb04f658, febd9444)
ec223de4 zfs_ioc_pool_import+0xcd(ea841000, 0, 0)
ec223e14 zfsdev_ioctl+0xe0()
ec223e44 cdev_ioctl+0x31(2d80000, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223e74 spec_ioctl+0x6b(da7ce340, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223ec4 fop_ioctl+0x49(da7ce340, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223f84 ioctl+0x171()
ec223fac sys_call+0x10c()
>
So we see it is waiting for transaction grou to sync. Let's find out
sync thread and see what it is doing, for this we need to take argument
to txg_wait_synced() and go from it:
> eb4147c0::print dsl_pool_t dp_tx.tx_sync_thread
dp_tx.tx_sync_thread = 0xeb5c9dc0
Let's see what it is doing:
> 0xeb5c9dc0::findstack -v
stack pointer for thread eb5c9dc0: eb5c9a28
eb5c9a68 swtch+0x188()
eb5c9a78 cv_wait+0x53(d38e1828, d38e1820, 0, 0)
eb5c9aa8 zio_wait+0x55(d38e15e0, d38e15e0, eb5c9af8, 1)
eb5c9ae8 dbuf_read+0x201(d8795028, 0)
eb5c9b08 dbuf_will_dirty+0x30(d8795028, ea7fa880, eb5c9b4c, 0)
eb5c9b68 dmu_write+0xd7(ec5b2e18, 22, 0, 1f0, 0, 10)
eb5c9c08 space_map_sync+0x304(e0815db8, 1, e0815c20, ec5b2e18, ea7fa880)
eb5c9c78 metaslab_sync+0x284(e0815c00, 65a, 0, 0)
eb5c9cb8 vdev_sync+0xc6(e6f8b4c0, 65a, 0)
eb5c9d28 spa_sync+0x3d0(ddbda380, 65a, 0, 772f6e75)
eb5c9da8 txg_sync_thread+0x308(eb4147c0, 0)
eb5c9db8 thread_start+8()
>
It s trying to write something (update) to a space map, so it needs to
read in existing content first, and this fails. Let's check what is
block pointer:
> d38e15e0::zio -c
ADDRESS TYPE STAGE WAITER
d397baa0 READ DONE -
> d397baa0::print zio_t io_bp|::blkptr
DVA[0]: vdev_id 0 / 6091e400
DVA[0]: GANG: FALSE GRID: 0000 ASIZE: 20000000000
DVA[0]: :0:6091e400:200:d
DVA[1]: vdev_id 0 / 3e091e400
DVA[1]: GANG: FALSE GRID: 0000 ASIZE: 20000000000
DVA[1]: :0:3e091e400:200:d
DVA[2]: vdev_id 0 / 78091e400
DVA[2]: GANG: FALSE GRID: 0000 ASIZE: 20000000000
DVA[2]: :0:78091e400:200:d
LSIZE: 1000 PSIZE: 200
ENDIAN: LITTLE TYPE: SPA space map
BIRTH: 5b5 LEVEL: 0 FILL: 100000000
CKFUNC: fletcher4 COMP: lzjb
CKSUM: 1f7bc0ee12:6fcfd90640d:10787c83addaf:1f3ef97a921b6f
>
We clearly see here that block pointer is the same as one declared by
zdb as corrupted (see another thread about zdb -e -bbcsL).
Let's check that this is indeed the same pool we are trying to import:
> eb4147c0::print dsl_pool_t dp_spa|::print -d struct spa spa_load_guid
spa_load_guid = 0t15989070886807735056
> ::ps -f! grep 1361
R 1361 1360 1353 1220 0 0x4a004000 e0596058 zpool import -f
-R /mnt/foo 15989070886807735056
Indeed, it is the same.
Let's check why other processes are stuck:
> 0t1783::pid2proc|::walk thread|::findstack -v
stack pointer for thread d362a680: eab3ac94
eab3acd4 swtch+0x188()
eab3ad24 turnstile_block+0x70b(d7b09f78, 0, fecb5418, fec04c80, 0, 0)
eab3ad94 mutex_vector_enter+0x28f(fecb5418)
eab3adc4 spa_all_configs+0x50(e6f9dd38, e6f9d000, 1020, 1)
eab3ade4 zfs_ioc_pool_configs+0x16(e6f9d000, e8e3c010, 1020)
eab3ae14 zfsdev_ioctl+0xe0()
eab3ae44 cdev_ioctl+0x31(2d80000, 5a04, 8041f00, 100003, e8e3c010,
eab3af00)
eab3ae74 spec_ioctl+0x6b(da7ce340, 5a04, 8041f00, 100003, e8e3c010,
eab3af00)
eab3aec4 fop_ioctl+0x49(da7ce340, 5a04, 8041f00, 100003, e8e3c010,
eab3af00)
eab3af84 ioctl+0x171()
eab3afac sys_sysenter+0x106()
> fecb5418::mutex
ADDR TYPE HELD MINSPL OLDSPL WAITERS
fecb5418 adapt ec1b8ce0 - - yes
> ec1b8ce0::findstack -v
stack pointer for thread ec1b8ce0: ec223c54
ec223c94 swtch+0x188()
ec223ca4 cv_wait+0x53(eb414966, eb414928, ffffffff, 0)
ec223ce4 txg_wait_synced+0x90(eb4147c0, 65a, 0, 0)
ec223d34 spa_config_update_common+0x88(ddbda380, 0, 0, ec223d68)
ec223d84 spa_import_common+0x3cf()
ec223db4 spa_import+0x18(ea841000, eb04f5e0, eb04f658, febd9444)
ec223de4 zfs_ioc_pool_import+0xcd(ea841000, 0, 0)
ec223e14 zfsdev_ioctl+0xe0()
ec223e44 cdev_ioctl+0x31(2d80000, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223e74 spec_ioctl+0x6b(da7ce340, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223ec4 fop_ioctl+0x49(da7ce340, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223f84 ioctl+0x171()
ec223fac sys_call+0x10c()
>
Ok, we see that 'zpool list' is waiting for our 'zpool import ...' to
complete.
Let's see what 'zfs list' is waiting for:
> 0t1754::pid2proc|::walk thread|::findstack -v
stack pointer for thread e6f8d880: eaf7cc94
eaf7ccd4 swtch+0x188()
eaf7cd24 turnstile_block+0x70b(d7b09f78, 0, fecb5418, fec04c80, 0, 0)
eaf7cd94 mutex_vector_enter+0x28f(fecb5418)
eaf7cdc4 spa_all_configs+0x50(e7c05d38, e7c05000, 1020, 1)
eaf7cde4 zfs_ioc_pool_configs+0x16(e7c05000, e8e3c010, 1020)
eaf7ce14 zfsdev_ioctl+0xe0()
eaf7ce44 cdev_ioctl+0x31(2d80000, 5a04, 8045ed0, 100003, e8e3c010,
eaf7cf00)
eaf7ce74 spec_ioctl+0x6b(da7ce340, 5a04, 8045ed0, 100003, e8e3c010,
eaf7cf00)
eaf7cec4 fop_ioctl+0x49(da7ce340, 5a04, 8045ed0, 100003, e8e3c010,
eaf7cf00)
eaf7cf84 ioctl+0x171()
eaf7cfac sys_sysenter+0x106()
> fecb5418::mutex
ADDR TYPE HELD MINSPL OLDSPL WAITERS
fecb5418 adapt ec1b8ce0 - - yes
> ec1b8ce0::findstack -v
stack pointer for thread ec1b8ce0: ec223c54
ec223c94 swtch+0x188()
ec223ca4 cv_wait+0x53(eb414966, eb414928, ffffffff, 0)
ec223ce4 txg_wait_synced+0x90(eb4147c0, 65a, 0, 0)
ec223d34 spa_config_update_common+0x88(ddbda380, 0, 0, ec223d68)
ec223d84 spa_import_common+0x3cf()
ec223db4 spa_import+0x18(ea841000, eb04f5e0, eb04f658, febd9444)
ec223de4 zfs_ioc_pool_import+0xcd(ea841000, 0, 0)
ec223e14 zfsdev_ioctl+0xe0()
ec223e44 cdev_ioctl+0x31(2d80000, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223e74 spec_ioctl+0x6b(da7ce340, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223ec4 fop_ioctl+0x49(da7ce340, 5a02, 80424a0, 100003, e8e3c208,
ec223f00)
ec223f84 ioctl+0x171()
ec223fac sys_call+0x10c()
>
The same story.
What is the mutex they are waiting for?
> fecb5418::whatis
fecb5418 is spa_namespace_lock+0 in zfs's bss
>
It is spa_namespace_lock
Hth,
Victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss