Hello. Having some hangs on a snv53 machine which is quite probably ZFS+NFS related, since that's all the machine do ;)
The machine is a 2x750MHz Blade1000 with 2GB ram, using a SysKonnect 9821 GigE card (with their 8.19.1.3 skge driver) and two HP branded MPT SCSI cards. Normal load is pretty much "read all you can" with misc tarballs and isos since it's an NFS backend to our caching http/ftp cluster delivering free software to the world. What happens is that the machine just stops responding.. it can respond to ping for a while (while userland is non-responsive, including console) but after a while, that stops too.. Produced a panic to get a dump and tried ::memstat; unterweser:/scratch/070103# mdb unix.0 vmcore.0 Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci pcisch ssd fcp fctl qlc md ip hook neti sctp arp usba s1394 nca lofs zfs random sd nfs ptm cpc ] > ::memstat Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 250919 1960 98% Anon 888 6 0% Exec and libs 247 1 0% Page cache 38 0 0% Free (cachelist) 405 3 0% Free (freelist) 4370 34 2% Total 256867 2006 Physical 253028 1976 That doesn't seem too healthy to me.. probably something kernely eating up everything and the machine is just swapping to death or something.. A dump from live kernel with mdb -k after 1.5h uptime; Page Summary Pages MB %Tot ------------ ---------------- ---------------- ---- Kernel 212310 1658 83% Anon 11307 88 4% Exec and libs 2418 18 1% Page cache 18400 143 7% Free (cachelist) 4383 34 2% Free (freelist) 8049 62 3% The tweaks I have are: set ncsize = 500000 set nfs:nrnode = 50 set zfs:zil_disable=1 set zfs:zfs_vdev_cache_bshift=14 set zfs:zfs_vdev_cache_size=0 Which according to ::kmem_cache results in about: 0000030002e30008 dmu_buf_impl_t 0000 000000 328 487728 0000030002e30288 dnode_t 0000 000000 640 453204 0000030002e30508 arc_buf_hdr_t 0000 000000 144 103544 0000030002e30788 arc_buf_t 0000 000000 40 36743 0000030002e30a08 zil_lwb_cache 0000 000000 200 0 0000030002e30c88 zfs_znode_cache 0000 000000 200 453200 but those buffers equal to about 550MB.. dnlc_nentries on the hung has gone down to 15000.. (where are the rest of the ~450k-15k dnode/znodes hanging out?) Hung kernel: > arc::print { anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x358a0600 p = 0x4000000 c = 0x4000000 c_min = 0x4000000 c_max = 0x5e114800 hits = 0xbc860fd misses = 0x2f296e1 deleted = 0x1d88739 recycle_miss = 0xf7f30c mutex_miss = 0x24b13d evict_skip = 0x21501d02 hash_elements = 0x27f97 hash_elements_max = 0x27f97 hash_collisions = 0x1651b43 hash_chains = 0x7ac3 hash_chain_max = 0x12 no_grow = 0x1 } Live kernel: > arc::print { anon = ARC_anon mru = ARC_mru mru_ghost = ARC_mru_ghost mfu = ARC_mfu mfu_ghost = ARC_mfu_ghost size = 0x1b279400 p = 0x1a1dcaa4 c = 0x1a1dcaa4 c_min = 0x4000000 c_max = 0x5e114800 hits = 0xef7c96 misses = 0x25efa8 deleted = 0x1db537 recycle_miss = 0xa6221 mutex_miss = 0x12b59 evict_skip = 0x70d62b hash_elements = 0xcda1 hash_elements_max = 0x1b589 hash_collisions = 0x18e58a hash_chains = 0x3d16 hash_chain_max = 0xf no_grow = 0x1 } Should I post ::kmem_cache and/or ::kmastat somewhere? It's about 2*(20+30)kB.. /Tomas -- Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/ |- Student at Computing Science, University of Umeå `- Sysadmin at {cs,acc}.umu.se _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss