Hello Robert, Wednesday, August 2, 2006, 12:22:11 AM, you wrote:
RM> Hello Neil, RM> Tuesday, August 1, 2006, 8:45:02 PM, you wrote: NP>> Robert Milkowski wrote On 08/01/06 11:41,: >>> Hello Robert, >>> >>> Monday, July 31, 2006, 12:48:30 AM, you wrote: >>> >>> RM> Hello ZFS, >>> >>> RM> System was rebooted and after reboot server again >>> >>> RM> System is snv_39, SPARC, T2000 >>> >>> RM> bash-3.00# ptree >>> RM> 7 /lib/svc/bin/svc.startd -s >>> RM> 163 /sbin/sh /lib/svc/method/fs-local >>> RM> 254 /usr/sbin/zfs mount -a >>> RM> [...] >>> >>> RM> bash-3.00# zfs list|wc -l >>> RM> 46 >>> >>> RM> Using df I can see most file systems are already mounted. >>> >>> RM> >>> >>>>>::ps!grep zfs >>> >>> RM> R 254 163 7 7 0 0x4a004000 00000600219e1800 zfs >>> >>>>>00000600219e1800::walk thread|::findstack -v >>> >>> RM> stack pointer for thread 300013026a0: 2a10069ebb1 >>> RM> [ 000002a10069ebb1 cv_wait+0x40() ] >>> RM> 000002a10069ec61 txg_wait_synced+0x54(3000052f0d0, 2f2216d, >>> RM> 3000107fb90, 3000052f110, 3000052f112, 3000052f0c8) >>> RM> 000002a10069ed11 zil_replay+0xbc(60022609c08, 600226a1840, >>> RM> 600226a1870, 700d13d8, 7ba2d000, 60006821200) >>> RM> 000002a10069ee01 zfs_domount+0x1f0(7ba35000, 700d1000, 60021a8bae0, >>> 1, 0, 0) >>> RM> 000002a10069eed1 zfs_mount+0x160(600225f6080, 1868800, 2a10069f9d8, >>> 60021a8bae0, 2f, 0) >>> RM> 000002a10069efb1 domount+0x9b0(2a10069f8b0, 1, 600225f6080, 0, 0, 0) >>> RM> 000002a10069f121 mount+0x110(600013fbcf8, 2a10069fad8, 0, 0, >>> ff38e40c, 100) >>> RM> 000002a10069f221 syscall_ap+0x44(2a0, ffbfeca0, 1118bf4, 600013fbc40, >>> 15, 0) >>> RM> 000002a10069f2e1 syscall_trap32+0xcc(52b3c8, ffbfeca0, 100, ff38e40c, >>> 0, 0) >>> >>> >>> >>> >>> RM> # iostat -xnz 1 >>> RM> extended device statistics >>> RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >>> RM> 3.0 513.0 192.0 2822.1 0.0 2.1 0.0 4.0 0 95 >>> RM> c4t600C0FF00000000009258F3E4C4C5601d0 >>> RM> extended device statistics >>> RM> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device >>> RM> 7.0 598.1 388.6 1832.9 0.0 2.0 0.0 3.4 0 93 >>> RM> c4t600C0FF00000000009258F3E4C4C5601d0 >>> RM> ^C >>> RM> bash-3.00# >>> >>> RM> However for many seconds no IOs at all are issued. >>> >>> RM> # mpstat 1 >>> RM> [...] >>> >>> RM> CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys >>> wt idl >>> RM> 0 0 0 74 209 109 0 0 0 0 0 0 0 1 0 >>> 99 >>> RM> 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 2 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 3 0 0 0 1 0 0 0 0 0 0 0 0 100 >>> 0 0 >>> RM> 4 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 5 0 0 0 6 1 8 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 6 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 7 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 8 3 0 1 7 0 12 0 0 2 0 245 1 1 0 >>> 98 >>> RM> 9 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 10 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 11 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 12 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 13 0 0 0 6 0 10 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 14 0 0 0 3 0 2 1 0 0 0 188 0 1 0 >>> 99 >>> RM> 15 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 16 0 0 0 8 1 12 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 17 0 0 0 7 0 12 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 18 0 0 0 23 0 43 0 0 1 0 0 0 0 0 >>> 100 >>> RM> 19 0 0 0 9 0 16 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 20 0 0 0 8 0 14 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 21 0 0 0 7 2 8 0 1 0 0 1 0 1 0 >>> 99 >>> RM> 22 0 0 17 38 34 7 0 1 1 0 5 0 0 0 >>> 100 >>> RM> 23 0 0 0 7 0 12 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 24 0 0 0 4 0 6 0 1 0 0 0 0 0 0 >>> 100 >>> RM> 25 0 0 1 5 0 8 0 1 0 0 0 0 0 0 >>> 100 >>> RM> 26 0 0 0 2 0 2 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 27 0 0 0 2 0 2 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 28 0 0 0 2 0 2 0 0 1 0 0 0 0 0 >>> 100 >>> RM> 29 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 30 0 0 0 1 0 0 0 0 0 0 0 0 0 0 >>> 100 >>> RM> 31 0 0 0 6 0 8 0 0 0 0 0 0 0 0 >>> 100 >>> RM> ^C >>> RM> bash-3.00# >>> >>> RM> # dtrace -n fbt:::entry'{self->t=timestamp;}' -n >>> RM> fbt:::return'/self->t/[EMAIL PROTECTED](timestamp-self->t);self->t=0;}' >>> -n tick-5s'{printa(@);exit(0);}' >>> RM> [...] >>> RM> syscall_mstate 12636328 >>> RM> callout_schedule_1 14428108 >>> RM> hwblkclr 16656308 >>> RM> avl_rotation 17196252 >>> RM> page_pptonum 19457456 >>> RM> sfmmu_mlist_enter 20078508 >>> RM> sfmmu_mlist_exit 20804176 >>> RM> page_numtomemseg_nolock 23377972 >>> RM> avl_insert 23727512 >>> RM> avl_insert_here 25281752 >>> RM> sfmmu_mlspl_enter 32277524 >>> RM> kmem_cache_alloc 38497240 >>> RM> xc_serv 94880184 >>> RM> disp_getwork 118366960 >>> RM> disp_anywork 191004096 >>> RM> space_map_seg_compare 573117700 >>> RM> resume 7059674296 >>> >>> RM> # >>> >>> RM> bash-3.00# dtrace -n >>> RM> fbt::space_map_seg_compare:entry'[EMAIL PROTECTED]()]=count();}' >>> RM> dtrace: description 'fbt::space_map_seg_compare:entry' matched 1 probe >>> RM> ^C >>> >>> >>> RM> genunix`avl_find+0x38 >>> RM> zfs`space_map_add+0x12c >>> RM> zfs`space_map_load+0x214 >>> RM> zfs`metaslab_activate+0x3c >>> RM> zfs`metaslab_group_alloc+0x1b0 >>> RM> zfs`metaslab_alloc_dva+0x10c >>> RM> zfs`metaslab_alloc+0x2c >>> RM> zfs`zio_dva_allocate+0x50 >>> RM> zfs`zio_write_compress+0x1e4 >>> RM> zfs`arc_write+0xbc >>> RM> zfs`dbuf_sync+0x6b0 >>> RM> zfs`dnode_sync+0x300 >>> RM> zfs`dmu_objset_sync_dnodes+0x68 >>> RM> zfs`dmu_objset_sync+0x50 >>> RM> zfs`dsl_dataset_sync+0xc >>> RM> zfs`dsl_pool_sync+0x60 >>> RM> zfs`spa_sync+0xe0 >>> RM> zfs`txg_sync_thread+0x130 >>> RM> unix`thread_start+0x4 >>> RM> 316769 >>> >>> RM> genunix`avl_find+0x38 >>> RM> zfs`space_map_remove+0x98 >>> RM> zfs`space_map_load+0x214 >>> RM> zfs`metaslab_activate+0x3c >>> RM> zfs`metaslab_group_alloc+0x1b0 >>> RM> zfs`metaslab_alloc_dva+0x10c >>> RM> zfs`metaslab_alloc+0x2c >>> RM> zfs`zio_dva_allocate+0x50 >>> RM> zfs`zio_write_compress+0x1e4 >>> RM> zfs`arc_write+0xbc >>> RM> zfs`dbuf_sync+0x6b0 >>> RM> zfs`dnode_sync+0x300 >>> RM> zfs`dmu_objset_sync_dnodes+0x68 >>> RM> zfs`dmu_objset_sync+0x50 >>> RM> zfs`dsl_dataset_sync+0xc >>> RM> zfs`dsl_pool_sync+0x60 >>> RM> zfs`spa_sync+0xe0 >>> RM> zfs`txg_sync_thread+0x130 >>> RM> unix`thread_start+0x4 >>> RM> 404210 >>> RM> bash-3.00# >>> >>> RM> Above was running for 3-4seconds >>> >>> >>> >>> RM> I can see some progress - I mean it mounts another filesystem in every >>> RM> 3-10 minutes. >>> >>> RM> However overall it could take even over an hour to boot server before >>> RM> zfs mount -a mounts actually all file systems. Also with default >>> RM> Solaris configs during this stage you can't login either from console >>> RM> or ssh and no messages are issued. That is bad also as you don't know >>> RM> what is going on. We had to reboot again, boot to single user, run >>> RM> sshd, login via ssh and let boot process to proceed in order to see >>> RM> why server won't boot. >>> >>> RM> Using zpool status all pools are ok and no errors at all are reported. >>> >>> >>> RM> Summary: >>> >>> RM> 1. zfs mount -a shouldn't "sleep" 1h! >>> RM> 2. one virtual CPU core of T2000 1GH is consumed for >>> RM> most of the time during zfs mount -a - why is it so CPU >>> RM> intensive? On that server CPU became limiting in that way >>> RM> that usually no IOs were issued to disks during mount. >>> RM> When it happened on faster CPU disks were doing 100MB/s, >>> RM> but still it took 10-20 minutes to zfs mount disks. >>> RM> 3. I thought zfs mount problem so it exits earlier was already >>> RM> addressed. So this is completely new or still something wrong is >>> RM> going on. I'm thinking about >>> RM> http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6377670 >>> RM> 4. Would be better if zfs mount -a would actually do it in parallel >>> RM> as it looks like it mounts in serial fashion right now >>> >>> RM> Above behavior is really BAD and it makes ZFS looks like fsck is >>> RM> actually build-in - I know it isn't. >>> RM> >>> >>> >>> >>> Any ideas? NP>> It looks to me like the mount code before replaying the log flushes all NP>> transactions in the group (txg_wait_synced()) to get some consistency. NP>> This then takes a very long time - searching the space map perhaps because NP>> it's nearly full. Is it? RM> More than 200GB free in a pool of about 4.5TB. I run 'zfs set quota=400g nfs-s5-s8/d5201 nfs-s5-s8/d5202 nfs-s5-s8/d5203 nfs-s5-s8/d5' and it's still running after 30 minutes. # zpool iostat nfs-s5-s8 1 capacity operations bandwidth pool used avail read write read write ---------- ----- ----- ----- ----- ----- ----- nfs-s5-s8 4.24T 302G 243 25 13.1M 223K nfs-s5-s8 4.24T 302G 409 0 15.0M 0 nfs-s5-s8 4.24T 302G 162 0 8.95M 0 nfs-s5-s8 4.24T 302G 317 0 18.9M 0 nfs-s5-s8 4.24T 302G 298 30 24.0M 90.0K nfs-s5-s8 4.24T 302G 374 0 26.9M 0 # mdb -kw Loading modules: [ unix krtld genunix specfs dtrace ufs sd pcisch md ip sctp usba lofs logindmux ptm qlc fctl cpc fcp fcip random crypto zfs ssd nfs ] > ::ps!grep zfs R 23426 2460 23426 2454 0 0x4a004000 0000060002a66790 zfs > 0000060002a66790::walk thread|::findstack -v stack pointer for thread 3006703b320: 2a10af5ccb1 [ 000002a10af5ccb1 cv_wait+0x40() ] 000002a10af5cd61 txg_wait_synced+0x54(30055528990, 2f23aa5, 2f23aa3, 300555289d0, 300555289d2, 30055528988) 000002a10af5ce11 dsl_sync_task_group_wait+0x120(300c0860360, 0, 2f23aa5, 71, 7b64ecf8, 300c0860388) 000002a10af5cec1 dsl_sync_task_do+0x28(300c0860360, 7b611dcc, 7b611e78, 60009e5e380, 2a10af5d828, 0) 000002a10af5cf71 dsl_dir_set_quota+0x64(7b648400, 6400000000, 7b611c00, 7b611c00, 60009e5e380, 0) 000002a10af5d031 zfsdev_ioctl+0x158(701d8c00, 3f, ffbfec18, 15, 54, 600027aa000) 000002a10af5d0e1 fop_ioctl+0x20(600026dd280, 5a15, ffbfec18, 100003, 6001cacde58, 120e894) 000002a10af5d191 ioctl+0x184(3, 60011c64ab8, ffbfec18, b614c, 1010101, 5a15) 000002a10af5d2e1 syscall_trap32+0xcc(3, 5a15, ffbfec18, b614c, 1010101, fffffd74) Setting quota on file systems in other pools takes something like 10s which still is quite long but not 30 minutes (or more)! The only difference I see is that nfs-s5-s8 has less free space than other pools. This is snv_43 as I reimported pool on another server. Something is terribly wrong here. -- Best regards, Robert mailto:[EMAIL PROTECTED] http://milek.blogspot.com _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss