Re: [zfs-discuss] ZFS related (probably) hangs due to memory exhaustion(?) with snv53

Mark Maybee Wed, 03 Jan 2007 07:01:07 -0800

Tomas,

There are a couple of things going on here:


1. There is a lot of fragmentation in your meta-data caches (znode,
dnode, dbuf, etc).  This is burning up about 300MB of space in your
hung kernel.  This is a known problem that we are currently working
on.

2. While the ARC has set its desired size down to c_min (64MB), its
actually still consuming ~800MB in the hung kernel.  This is odd.
The bulk of this space is in the 32K and 64K data caches.  Could
you print out the contents of ARC_anon, ARC_mru, ARC_mfu, ARC_mru_ghost,
and ARC_mfu_ghost?

-Mark

Tomas Ögren wrote:

Hello.

Having some hangs on a snv53 machine which is quite probably ZFS+NFS
related, since that's all the machine do ;)

The machine is a 2x750MHz Blade1000 with 2GB ram, using a SysKonnect
9821 GigE card (with their 8.19.1.3 skge driver) and two HP branded MPT
SCSI cards. Normal load is pretty much "read all you can" with misc
tarballs and isos since it's an NFS backend to our caching http/ftp
cluster delivering free software to the world.

What happens is that the machine just stops responding.. it can respond
to ping for a while (while userland is non-responsive, including
console) but after a while, that stops too..

Produced a panic to get a dump and tried ::memstat;
unterweser:/scratch/070103# mdb unix.0 vmcore.0
Loading modules: [ unix krtld genunix specfs dtrace ufs scsi_vhci pcisch
ssd fcp fctl qlc md ip hook neti sctp arp usba s1394 nca lofs zfs random
sd nfs ptm cpc ]

::memstat

Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     250919              1960   98%
Anon                          888                 6    0%
Exec and libs                 247                 1    0%
Page cache                     38                 0    0%
Free (cachelist)              405                 3    0%
Free (freelist)              4370                34    2%

Total                      256867              2006
Physical                   253028              1976

That doesn't seem too healthy to me.. probably something kernely eating
up everything and the machine is just swapping to death or something..

A dump from live kernel with mdb -k after 1.5h uptime;
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                     212310              1658   83%
Anon                        11307                88    4%
Exec and libs                2418                18    1%
Page cache                  18400               143    7%
Free (cachelist)             4383                34    2%
Free (freelist)              8049                62    3%


The tweaks I have are:
set ncsize = 500000
set nfs:nrnode = 50
set zfs:zil_disable=1
set zfs:zfs_vdev_cache_bshift=14
set zfs:zfs_vdev_cache_size=0

Which according to ::kmem_cache results in about:
0000030002e30008 dmu_buf_impl_t            0000 000000      328   487728
0000030002e30288 dnode_t                   0000 000000      640   453204
0000030002e30508 arc_buf_hdr_t             0000 000000      144   103544
0000030002e30788 arc_buf_t                 0000 000000       40    36743
0000030002e30a08 zil_lwb_cache             0000 000000      200        0
0000030002e30c88 zfs_znode_cache           0000 000000      200   453200

but those buffers equal to about 550MB..

dnlc_nentries on the hung has gone down to 15000.. (where are the rest
of the ~450k-15k dnode/znodes hanging out?)

Hung kernel:

arc::print

{
    anon = ARC_anon
    mru = ARC_mru
    mru_ghost = ARC_mru_ghost
    mfu = ARC_mfu
    mfu_ghost = ARC_mfu_ghost
    size = 0x358a0600
    p = 0x4000000
    c = 0x4000000
    c_min = 0x4000000
    c_max = 0x5e114800
    hits = 0xbc860fd
    misses = 0x2f296e1
    deleted = 0x1d88739
    recycle_miss = 0xf7f30c
    mutex_miss = 0x24b13d
    evict_skip = 0x21501d02
    hash_elements = 0x27f97
    hash_elements_max = 0x27f97
    hash_collisions = 0x1651b43
    hash_chains = 0x7ac3
    hash_chain_max = 0x12
    no_grow = 0x1
}


Live kernel:

arc::print

{
    anon = ARC_anon
    mru = ARC_mru
    mru_ghost = ARC_mru_ghost
    mfu = ARC_mfu
    mfu_ghost = ARC_mfu_ghost
    size = 0x1b279400
    p = 0x1a1dcaa4
    c = 0x1a1dcaa4
    c_min = 0x4000000
    c_max = 0x5e114800
    hits = 0xef7c96
    misses = 0x25efa8
    deleted = 0x1db537
    recycle_miss = 0xa6221
    mutex_miss = 0x12b59
    evict_skip = 0x70d62b
    hash_elements = 0xcda1
    hash_elements_max = 0x1b589
    hash_collisions = 0x18e58a
    hash_chains = 0x3d16
    hash_chain_max = 0xf
    no_grow = 0x1
}


Should I post ::kmem_cache and/or ::kmastat somewhere? It's about
2*(20+30)kB..

/Tomas

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS related (probably) hangs due to memory exhaustion(?) with snv53

Reply via email to