Dennis Clarke wrote:
 CTRL+C does nothing and kill -9 pid does nothing to this command.

feels like a bug to me
Yes, it is:

http://bugs.opensolaris.org/view_bug.do?bug_id=6758902


Now I recall why I had to reboot.  Seems as if a lot of commands hang now.
Things like :

df -ak

zfs list

pid of 'zfs list' is 1754

zpool list

pid of 'zpool list' is 1873

they all just hang.

Ok, here's a little bit more details:

pid of 'zpool import' is 1361.

> 0t1361::pid2proc|::walk thread|::findstack -v
stack pointer for thread ec1b8ce0: ec223c54
  ec223c94 swtch+0x188()
  ec223ca4 cv_wait+0x53(eb414966, eb414928, ffffffff, 0)
  ec223ce4 txg_wait_synced+0x90(eb4147c0, 65a, 0, 0)
  ec223d34 spa_config_update_common+0x88(ddbda380, 0, 0, ec223d68)
  ec223d84 spa_import_common+0x3cf()
  ec223db4 spa_import+0x18(ea841000, eb04f5e0, eb04f658, febd9444)
  ec223de4 zfs_ioc_pool_import+0xcd(ea841000, 0, 0)
  ec223e14 zfsdev_ioctl+0xe0()
ec223e44 cdev_ioctl+0x31(2d80000, 5a02, 80424a0, 100003, e8e3c208, ec223f00) ec223e74 spec_ioctl+0x6b(da7ce340, 5a02, 80424a0, 100003, e8e3c208, ec223f00) ec223ec4 fop_ioctl+0x49(da7ce340, 5a02, 80424a0, 100003, e8e3c208, ec223f00)
  ec223f84 ioctl+0x171()
  ec223fac sys_call+0x10c()
>

So we see it is waiting for transaction grou to sync. Let's find out sync thread and see what it is doing, for this we need to take argument to txg_wait_synced() and go from it:

> eb4147c0::print dsl_pool_t dp_tx.tx_sync_thread
dp_tx.tx_sync_thread = 0xeb5c9dc0

Let's see what it is doing:

> 0xeb5c9dc0::findstack -v
stack pointer for thread eb5c9dc0: eb5c9a28
  eb5c9a68 swtch+0x188()
  eb5c9a78 cv_wait+0x53(d38e1828, d38e1820, 0, 0)
  eb5c9aa8 zio_wait+0x55(d38e15e0, d38e15e0, eb5c9af8, 1)
  eb5c9ae8 dbuf_read+0x201(d8795028, 0)
  eb5c9b08 dbuf_will_dirty+0x30(d8795028, ea7fa880, eb5c9b4c, 0)
  eb5c9b68 dmu_write+0xd7(ec5b2e18, 22, 0, 1f0, 0, 10)
  eb5c9c08 space_map_sync+0x304(e0815db8, 1, e0815c20, ec5b2e18, ea7fa880)
  eb5c9c78 metaslab_sync+0x284(e0815c00, 65a, 0, 0)
  eb5c9cb8 vdev_sync+0xc6(e6f8b4c0, 65a, 0)
  eb5c9d28 spa_sync+0x3d0(ddbda380, 65a, 0, 772f6e75)
  eb5c9da8 txg_sync_thread+0x308(eb4147c0, 0)
  eb5c9db8 thread_start+8()
>

It s trying to write something (update) to a space map, so it needs to read in existing content first, and this fails. Let's check what is block pointer:

> d38e15e0::zio -c
ADDRESS                          TYPE  STAGE            WAITER
 d397baa0                        READ  DONE             -
> d397baa0::print zio_t io_bp|::blkptr
DVA[0]: vdev_id 0 / 6091e400
DVA[0]:       GANG: FALSE  GRID:  0000  ASIZE: 20000000000
DVA[0]: :0:6091e400:200:d
DVA[1]: vdev_id 0 / 3e091e400
DVA[1]:       GANG: FALSE  GRID:  0000  ASIZE: 20000000000
DVA[1]: :0:3e091e400:200:d
DVA[2]: vdev_id 0 / 78091e400
DVA[2]:       GANG: FALSE  GRID:  0000  ASIZE: 20000000000
DVA[2]: :0:78091e400:200:d
LSIZE:  1000                            PSIZE: 200
ENDIAN: LITTLE                                  TYPE:  SPA space map
BIRTH:  5b5                LEVEL: 0     FILL:  100000000
CKFUNC: fletcher4                       COMP:  lzjb
CKSUM:  1f7bc0ee12:6fcfd90640d:10787c83addaf:1f3ef97a921b6f
>

We clearly see here that block pointer is the same as one declared by zdb as corrupted (see another thread about zdb -e -bbcsL).

Let's check that this is indeed the same pool we are trying to import:

> eb4147c0::print dsl_pool_t dp_spa|::print -d struct spa spa_load_guid
spa_load_guid = 0t15989070886807735056
> ::ps -f! grep 1361
R 1361 1360 1353 1220 0 0x4a004000 e0596058 zpool import -f -R /mnt/foo 15989070886807735056

Indeed, it is the same.

Let's check why other processes are stuck:

> 0t1783::pid2proc|::walk thread|::findstack -v
stack pointer for thread d362a680: eab3ac94
  eab3acd4 swtch+0x188()
  eab3ad24 turnstile_block+0x70b(d7b09f78, 0, fecb5418, fec04c80, 0, 0)
  eab3ad94 mutex_vector_enter+0x28f(fecb5418)
  eab3adc4 spa_all_configs+0x50(e6f9dd38, e6f9d000, 1020, 1)
  eab3ade4 zfs_ioc_pool_configs+0x16(e6f9d000, e8e3c010, 1020)
  eab3ae14 zfsdev_ioctl+0xe0()
eab3ae44 cdev_ioctl+0x31(2d80000, 5a04, 8041f00, 100003, e8e3c010, eab3af00) eab3ae74 spec_ioctl+0x6b(da7ce340, 5a04, 8041f00, 100003, e8e3c010, eab3af00) eab3aec4 fop_ioctl+0x49(da7ce340, 5a04, 8041f00, 100003, e8e3c010, eab3af00)
  eab3af84 ioctl+0x171()
  eab3afac sys_sysenter+0x106()
> fecb5418::mutex
    ADDR  TYPE     HELD MINSPL OLDSPL WAITERS
fecb5418 adapt ec1b8ce0      -      -     yes
> ec1b8ce0::findstack -v
stack pointer for thread ec1b8ce0: ec223c54
  ec223c94 swtch+0x188()
  ec223ca4 cv_wait+0x53(eb414966, eb414928, ffffffff, 0)
  ec223ce4 txg_wait_synced+0x90(eb4147c0, 65a, 0, 0)
  ec223d34 spa_config_update_common+0x88(ddbda380, 0, 0, ec223d68)
  ec223d84 spa_import_common+0x3cf()
  ec223db4 spa_import+0x18(ea841000, eb04f5e0, eb04f658, febd9444)
  ec223de4 zfs_ioc_pool_import+0xcd(ea841000, 0, 0)
  ec223e14 zfsdev_ioctl+0xe0()
ec223e44 cdev_ioctl+0x31(2d80000, 5a02, 80424a0, 100003, e8e3c208, ec223f00) ec223e74 spec_ioctl+0x6b(da7ce340, 5a02, 80424a0, 100003, e8e3c208, ec223f00) ec223ec4 fop_ioctl+0x49(da7ce340, 5a02, 80424a0, 100003, e8e3c208, ec223f00)
  ec223f84 ioctl+0x171()
  ec223fac sys_call+0x10c()
>

Ok, we see that 'zpool list' is waiting for our 'zpool import ...' to complete.

Let's see what 'zfs list' is waiting for:

> 0t1754::pid2proc|::walk thread|::findstack -v
stack pointer for thread e6f8d880: eaf7cc94
  eaf7ccd4 swtch+0x188()
  eaf7cd24 turnstile_block+0x70b(d7b09f78, 0, fecb5418, fec04c80, 0, 0)
  eaf7cd94 mutex_vector_enter+0x28f(fecb5418)
  eaf7cdc4 spa_all_configs+0x50(e7c05d38, e7c05000, 1020, 1)
  eaf7cde4 zfs_ioc_pool_configs+0x16(e7c05000, e8e3c010, 1020)
  eaf7ce14 zfsdev_ioctl+0xe0()
eaf7ce44 cdev_ioctl+0x31(2d80000, 5a04, 8045ed0, 100003, e8e3c010, eaf7cf00) eaf7ce74 spec_ioctl+0x6b(da7ce340, 5a04, 8045ed0, 100003, e8e3c010, eaf7cf00) eaf7cec4 fop_ioctl+0x49(da7ce340, 5a04, 8045ed0, 100003, e8e3c010, eaf7cf00)
  eaf7cf84 ioctl+0x171()
  eaf7cfac sys_sysenter+0x106()
> fecb5418::mutex
    ADDR  TYPE     HELD MINSPL OLDSPL WAITERS
fecb5418 adapt ec1b8ce0      -      -     yes
> ec1b8ce0::findstack -v
stack pointer for thread ec1b8ce0: ec223c54
  ec223c94 swtch+0x188()
  ec223ca4 cv_wait+0x53(eb414966, eb414928, ffffffff, 0)
  ec223ce4 txg_wait_synced+0x90(eb4147c0, 65a, 0, 0)
  ec223d34 spa_config_update_common+0x88(ddbda380, 0, 0, ec223d68)
  ec223d84 spa_import_common+0x3cf()
  ec223db4 spa_import+0x18(ea841000, eb04f5e0, eb04f658, febd9444)
  ec223de4 zfs_ioc_pool_import+0xcd(ea841000, 0, 0)
  ec223e14 zfsdev_ioctl+0xe0()
ec223e44 cdev_ioctl+0x31(2d80000, 5a02, 80424a0, 100003, e8e3c208, ec223f00) ec223e74 spec_ioctl+0x6b(da7ce340, 5a02, 80424a0, 100003, e8e3c208, ec223f00) ec223ec4 fop_ioctl+0x49(da7ce340, 5a02, 80424a0, 100003, e8e3c208, ec223f00)
  ec223f84 ioctl+0x171()
  ec223fac sys_call+0x10c()
>

The same story.

What is the mutex they are waiting for?

> fecb5418::whatis
fecb5418 is spa_namespace_lock+0 in zfs's bss
>

It is spa_namespace_lock

Hth,
Victor
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to