Hi, Decided to cross post this over here as well since it seems like it could be something actually wrong and not me being an idiot. Feel free to let me know if I'm an idiot. :)
gstat is showing almost no IO hitting da1 which is my zpool(21 disk raid50 3x7). Yet the zvols are all backed up. 0 272 7 71 11.6 264 4439 0.2 4.8| da1 0 4 2 26 6.7 2 23 557.0 126.7| zvol/san/a2s61 0 14 0 0 0.0 14 86 91.5 132.5| zvol/san/a2s66 1 14 0 1 13.8 14 154 100.6 140.2| zvol/san/solsemi1 1 19 1 5 0.1 18 156 76.6 139.8| zvol/san/solman1 1 6 1 26 8.4 5 112 275.3 140.9| zvol/san/a2s62 1 16 1 5 9.1 16 317 88.1 139.7| zvol/san/solman2 1 29 1 2 6.6 29 214 48.8 139.8| zvol/san/solsemi2 1 7 1 2 8.5 6 50 232.5 140.4| zvol/san/solman4 I've tweaked only a few settings from default. [root@san2 ~]# cat /boot/loader.conf console="comconsole,vidconsole" comconsole_speed="115200" vm.kmem_size="30G" vfs.zfs.arc_max="22G" kern.hz=100 loader_logo="beastie" [root@san2 ~]# cat /etc/sysctl.conf net.inet.tcp.sendbuf_max=16777216 net.inet.tcp.recvbuf_max=16777216 net.inet.tcp.sendspace=65536 net.inet.tcp.recvspace=131072 [root@san2 ~]# zpool status pool: san state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM san ONLINE 0 0 0 da1 ONLINE 0 0 0 logs mirror ONLINE 0 0 0 ad6s1b ONLINE 0 0 0 ad14s1b ONLINE 0 0 0 cache ad6s1d ONLINE 0 0 0 ad14s1d ONLINE 0 0 0 errors: No known data errors All my volumes are the same. I don't manually adjust any properties. [root@san2 ~]# zfs get all san NAME PROPERTY VALUE SOURCE san type filesystem - san creation Tue Feb 8 9:58 2011 - san used 9.10T - san available 3.33T - san referenced 221M - san compressratio 1.00x - san mounted yes - san quota none default san reservation none default san recordsize 128K default san mountpoint /san default san sharenfs off default san checksum off local san compression off default san atime on default san devices on default san exec on default san setuid on default san readonly off default san jailed off default san snapdir hidden default san aclmode groupmask default san aclinherit restricted default san canmount on default san shareiscsi off default san xattr off temporary san copies 1 default san version 4 - san utf8only off - san normalization none - san casesensitivity sensitive - san vscan off default san nbmand off default san sharesmb off default san refquota none default san refreservation none default san primarycache all default san secondarycache all default san usedbysnapshots 0 - san usedbydataset 221M - san usedbychildren 9.10T - san usedbyrefreservation 0 - [root@san2 ~]# zfs get all san/a2s66 NAME PROPERTY VALUE SOURCE san/a2s66 type volume - san/a2s66 creation Wed Sep 21 16:25 2011 - san/a2s66 used 770G - san/a2s66 available 3.33T - san/a2s66 referenced 753G - san/a2s66 compressratio 1.00x - san/a2s66 reservation none default san/a2s66 volsize 750G - san/a2s66 volblocksize 4K - san/a2s66 checksum off inherited from san san/a2s66 compression off default san/a2s66 readonly off default san/a2s66 shareiscsi off default san/a2s66 copies 1 default san/a2s66 refreservation none default san/a2s66 primarycache all default san/a2s66 secondarycache all default san/a2s66 usedbysnapshots 17.3G - san/a2s66 usedbydataset 753G - san/a2s66 usedbychildren 0 - san/a2s66 usedbyrefreservation 0 - last pid: 60292; load averages: 0.96, 0.67, 0.80 up 9+17:31:59 17:41:52 63 processes: 2 running, 61 sleeping CPU: 1.3% user, 0.0% nice, 46.4% system, 1.1% interrupt, 51.2% idle Mem: 37M Active, 32M Inact, 22G Wired, 15M Cache, 1940M Buf, 1075M Free Swap: 28G Total, 13M Used, 28G Free [root@san2 ~]# sysctl -a | grep zfs vfs.zfs.l2c_only_size: 81245392896 vfs.zfs.mfu_ghost_data_lsize: 51142656 vfs.zfs.mfu_ghost_metadata_lsize: 10687021568 vfs.zfs.mfu_ghost_size: 10738164224 vfs.zfs.mfu_data_lsize: 757547008 vfs.zfs.mfu_metadata_lsize: 954693120 vfs.zfs.mfu_size: 2612401664 vfs.zfs.mru_ghost_data_lsize: 1983434752 vfs.zfs.mru_ghost_metadata_lsize: 3657913344 vfs.zfs.mru_ghost_size: 5641348096 vfs.zfs.mru_data_lsize: 9817952768 vfs.zfs.mru_metadata_lsize: 395397632 vfs.zfs.mru_size: 10833757184 vfs.zfs.anon_data_lsize: 0 vfs.zfs.anon_metadata_lsize: 0 vfs.zfs.anon_size: 34037760 vfs.zfs.l2arc_norw: 1 vfs.zfs.l2arc_feed_again: 1 vfs.zfs.l2arc_noprefetch: 0 vfs.zfs.l2arc_feed_min_ms: 200 vfs.zfs.l2arc_feed_secs: 1 vfs.zfs.l2arc_headroom: 2 vfs.zfs.l2arc_write_boost: 8388608 vfs.zfs.l2arc_write_max: 8388608 vfs.zfs.arc_meta_limit: 5905580032 vfs.zfs.arc_meta_used: 5906093432 vfs.zfs.mdcomp_disable: 0 vfs.zfs.arc_min: 2952790016 vfs.zfs.arc_max: 23622320128 vfs.zfs.zfetch.array_rd_sz: 1048576 vfs.zfs.zfetch.block_cap: 256 vfs.zfs.zfetch.min_sec_reap: 2 vfs.zfs.zfetch.max_streams: 8 vfs.zfs.prefetch_disable: 1 vfs.zfs.check_hostid: 1 vfs.zfs.recover: 0 vfs.zfs.txg.write_limit_override: 0 vfs.zfs.txg.synctime: 5 vfs.zfs.txg.timeout: 30 vfs.zfs.scrub_limit: 10 vfs.zfs.vdev.cache.bshift: 16 vfs.zfs.vdev.cache.size: 10485760 vfs.zfs.vdev.cache.max: 16384 vfs.zfs.vdev.aggregation_limit: 131072 vfs.zfs.vdev.ramp_rate: 2 vfs.zfs.vdev.time_shift: 6 vfs.zfs.vdev.min_pending: 4 vfs.zfs.vdev.max_pending: 10 vfs.zfs.cache_flush_disable: 0 vfs.zfs.zil_disable: 0 vfs.zfs.zio.use_uma: 0 vfs.zfs.version.zpl: 4 vfs.zfs.version.spa: 15 vfs.zfs.version.dmu_backup_stream: 1 vfs.zfs.version.dmu_backup_header: 2 vfs.zfs.version.acl: 1 vfs.zfs.debug: 0 vfs.zfs.super_owner: 0 kstat.zfs.misc.zfetchstats.hits: 1622932834 kstat.zfs.misc.zfetchstats.misses: 300700562 kstat.zfs.misc.zfetchstats.colinear_hits: 144156 kstat.zfs.misc.zfetchstats.colinear_misses: 300556406 kstat.zfs.misc.zfetchstats.stride_hits: 1138458507 kstat.zfs.misc.zfetchstats.stride_misses: 386271 kstat.zfs.misc.zfetchstats.reclaim_successes: 5313527 kstat.zfs.misc.zfetchstats.reclaim_failures: 295242879 kstat.zfs.misc.zfetchstats.streams_resets: 141691 kstat.zfs.misc.zfetchstats.streams_noresets: 484474231 kstat.zfs.misc.zfetchstats.bogus_streams: 0 kstat.zfs.misc.arcstats.hits: 2877951340 kstat.zfs.misc.arcstats.misses: 677132553 kstat.zfs.misc.arcstats.demand_data_hits: 1090801028 kstat.zfs.misc.arcstats.demand_data_misses: 142078773 kstat.zfs.misc.arcstats.demand_metadata_hits: 760631826 kstat.zfs.misc.arcstats.demand_metadata_misses: 15429069 kstat.zfs.misc.arcstats.prefetch_data_hits: 77566631 kstat.zfs.misc.arcstats.prefetch_data_misses: 412415335 kstat.zfs.misc.arcstats.prefetch_metadata_hits: 948951855 kstat.zfs.misc.arcstats.prefetch_metadata_misses: 107209376 kstat.zfs.misc.arcstats.mru_hits: 762645834 kstat.zfs.misc.arcstats.mru_ghost_hits: 25063620 kstat.zfs.misc.arcstats.mfu_hits: 1792233827 kstat.zfs.misc.arcstats.mfu_ghost_hits: 107949685 kstat.zfs.misc.arcstats.allocated: 1220368604 kstat.zfs.misc.arcstats.deleted: 881708637 kstat.zfs.misc.arcstats.stolen: 324197286 kstat.zfs.misc.arcstats.recycle_miss: 393866103 kstat.zfs.misc.arcstats.mutex_miss: 47835019 kstat.zfs.misc.arcstats.evict_skip: 16800403516 kstat.zfs.misc.arcstats.evict_l2_cached: 3404346428416 kstat.zfs.misc.arcstats.evict_l2_eligible: 906780261888 kstat.zfs.misc.arcstats.evict_l2_ineligible: 1712274098176 kstat.zfs.misc.arcstats.hash_elements: 12932367 kstat.zfs.misc.arcstats.hash_elements_max: 26675689 kstat.zfs.misc.arcstats.hash_collisions: 1195027725 kstat.zfs.misc.arcstats.hash_chains: 524288 kstat.zfs.misc.arcstats.hash_chain_max: 114 kstat.zfs.misc.arcstats.p: 13900189206 kstat.zfs.misc.arcstats.c: 16514222070 kstat.zfs.misc.arcstats.c_min: 2952790016 kstat.zfs.misc.arcstats.c_max: 23622320128 kstat.zfs.misc.arcstats.size: 16514197960 kstat.zfs.misc.arcstats.hdr_size: 698646840 kstat.zfs.misc.arcstats.data_size: 13480204800 kstat.zfs.misc.arcstats.other_size: 222586112 kstat.zfs.misc.arcstats.l2_hits: 236859220 kstat.zfs.misc.arcstats.l2_misses: 440273314 kstat.zfs.misc.arcstats.l2_feeds: 998879 kstat.zfs.misc.arcstats.l2_rw_clash: 41492 kstat.zfs.misc.arcstats.l2_read_bytes: 1523423294976 kstat.zfs.misc.arcstats.l2_write_bytes: 2108729975808 kstat.zfs.misc.arcstats.l2_writes_sent: 908755 kstat.zfs.misc.arcstats.l2_writes_done: 908755 kstat.zfs.misc.arcstats.l2_writes_error: 0 kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 125029 kstat.zfs.misc.arcstats.l2_evict_lock_retry: 78155 kstat.zfs.misc.arcstats.l2_evict_reading: 52 kstat.zfs.misc.arcstats.l2_free_on_write: 735076 kstat.zfs.misc.arcstats.l2_abort_lowmem: 2368 kstat.zfs.misc.arcstats.l2_cksum_bad: 9 kstat.zfs.misc.arcstats.l2_io_error: 0 kstat.zfs.misc.arcstats.l2_size: 88680833024 kstat.zfs.misc.arcstats.l2_hdr_size: 2275280224 kstat.zfs.misc.arcstats.memory_throttle_count: 0 kstat.zfs.misc.arcstats.l2_write_trylock_fail: 160181805 kstat.zfs.misc.arcstats.l2_write_passed_headroom: 48073379 kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0 kstat.zfs.misc.arcstats.l2_write_in_l2: 101326826532 kstat.zfs.misc.arcstats.l2_write_io_in_progress: 3016312 kstat.zfs.misc.arcstats.l2_write_not_cacheable: 16631379447 kstat.zfs.misc.arcstats.l2_write_full: 158541 kstat.zfs.misc.arcstats.l2_write_buffer_iter: 998879 kstat.zfs.misc.arcstats.l2_write_pios: 908755 kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 881025143301120 kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 62580701 kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 2040782 kstat.zfs.misc.vdev_cache_stats.delegations: 2167916 kstat.zfs.misc.vdev_cache_stats.hits: 2801310 kstat.zfs.misc.vdev_cache_stats.misses: 5448597 On Tue, Oct 4, 2011 at 2:43 AM, Dave Cundiff <syshack...@gmail.com> wrote: > Hi, > > I'm running 8.2-RELEASE and running into an IO lockup on ZFS that is > happening pretty regularly. The system is stock except for the > following set in loader.conf > > vm.kmem_size="30G" > vfs.zfs.arc_max="22G" > kern.hz=100 > > I know the kmem settings aren't SUPPOSED to be necessary now, buy my > ZFS boxes were crashing until I added them. The machine has 24 gigs of > RAM. The kern.hz=100 was to stretch out the l2arc bug that pops up at > 28days with it set to 1000. > > [root@san2 ~]# zpool status > pool: san > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > san ONLINE 0 0 0 > da1 ONLINE 0 0 0 > logs > mirror ONLINE 0 0 0 > ad6s1b ONLINE 0 0 0 > ad14s1b ONLINE 0 0 0 > cache > ad6s1d ONLINE 0 0 0 > ad14s1d ONLINE 0 0 0 > > errors: No known data errors > > > Here's a zpool iostat from a machine in trouble. > > san 9.08T 3.55T 0 0 0 7.92K > san 9.08T 3.55T 0 447 0 5.77M > san 9.08T 3.55T 0 309 0 2.83M > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 62 0 2.22M 0 > san 9.08T 3.55T 0 2 0 23.5K > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 0 254 0 6.62M > san 9.08T 3.55T 0 249 0 3.16M > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 34 0 491K 0 > san 9.08T 3.55T 0 6 0 62.7K > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 0 85 0 6.59M > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 0 452 0 4.88M > san 9.08T 3.55T 109 0 3.12M 0 > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 0 0 0 7.84K > san 9.08T 3.55T 0 434 0 6.41M > san 9.08T 3.55T 0 0 0 0 > san 9.08T 3.55T 0 304 0 2.90M > san 9.08T 3.55T 37 0 628K 0 > > Its supposed to look like > > san 9.07T 3.56T 162 167 3.75M 6.09M > san 9.07T 3.56T 5 0 47.4K 0 > san 9.07T 3.56T 19 0 213K 0 > san 9.07T 3.56T 120 0 3.26M 0 > san 9.07T 3.56T 92 0 741K 0 > san 9.07T 3.56T 114 0 2.86M 0 > san 9.07T 3.56T 72 0 579K 0 > san 9.07T 3.56T 14 0 118K 0 > san 9.07T 3.56T 24 0 213K 0 > san 9.07T 3.56T 25 0 324K 0 > san 9.07T 3.56T 8 0 126K 0 > san 9.07T 3.56T 28 0 505K 0 > san 9.07T 3.56T 15 0 126K 0 > san 9.07T 3.56T 11 0 158K 0 > san 9.07T 3.56T 19 0 356K 0 > san 9.07T 3.56T 198 0 3.55M 0 > san 9.07T 3.56T 21 0 173K 0 > san 9.07T 3.56T 18 0 150K 0 > san 9.07T 3.56T 23 0 260K 0 > san 9.07T 3.56T 9 0 78.3K 0 > san 9.07T 3.56T 21 0 173K 0 > san 9.07T 3.56T 2 4.59K 16.8K 142M > san 9.07T 3.56T 12 0 103K 0 > san 9.07T 3.56T 26 454 312K 4.35M > san 9.07T 3.56T 111 0 3.34M 0 > san 9.07T 3.56T 28 0 870K 0 > san 9.07T 3.56T 75 0 3.88M 0 > san 9.07T 3.56T 43 0 1.22M 0 > san 9.07T 3.56T 26 0 270K 0 > > I don't know what triggers the problem but I know how to fix it. If I > perform a couple snapshot deletes the IO will come back in line every > single time. Fortunately I have LOTS of snapshots to delete. > > [root@san2 ~]# zfs list -r -t snapshot | wc -l > 5236 > [root@san2 ~]# zfs list -r -t volume | wc -l > 17 > > Being fairly new to FreeBSD and ZFS I'm pretty clueless on where to > begin tracking this down. I've been staring at gstat trying to see if > a zvol is getting a big burst of writes that may be flooding the drive > controller but I haven't caught anything yet. top -S -H shows > zio_write_issue threads consuming massive amounts of CPU during the > lockup. Normally they sit around 5-10%. Any suggestions on where I > could start to track this down would be greatly appreciated. > -- Dave Cundiff System Administrator A2Hosting, Inc http://www.a2hosting.com _______________________________________________ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"