Tomas,

comments inline...


Tomas Ögren wrote:

On 10 November, 2006 - Sanjeev Bagewadi sent me these 3,5K bytes:

1. DNLC-through-ZFS doesn't seem to listen to ncsize.

The filesystem currently has ~550k inodes and large portions of it is
frequently looked over with rsync (over nfs). mdb said ncsize was about
68k and vmstat -s  said we had a hitrate of ~30%, so I set ncsize to
600k and rebooted.. Didn't seem to change much, still seeing hitrates at
about the same and manual find(1) doesn't seem to be that cached
(according to vmstat and dnlcsnoop.d).
When booting, the following message came up, not sure if it matters or not:
NOTICE: setting nrnode to max value of 351642
NOTICE: setting nrnode to max value of 235577

Is there a separate ZFS-DNLC knob to adjust for this? Wild guess is that
it has its own implementation which is integrated with the rest of the
ZFS cache which throws out metadata cache in favour of data cache.. or
something..
Current memory usage (for some values of usage ;):
# echo ::memstat|mdb -k
Page Summary                Pages                MB  %Tot
------------     ----------------  ----------------  ----
Kernel                      95584               746   75%
Anon                        20868               163   16%
Exec and libs                1703                13    1%
Page cache                   1007                 7    1%
Free (cachelist)               97                 0    0%
Free (freelist)              7745                60    6%

Total                      127004               992
Physical                   125192               978


/Tomas


This memory usage shows nearly all of memory consumed by the kernel
and probably by ZFS.  ZFS can't add any more DNLC entries due to lack of
memory without purging others. This can be seen from  the number of
dnlc_nentries being way less than ncsize.
I don't know if there's a DMU or ARC bug to reduce the memory footprint
of their internal structures for situations like this, but we are aware of the
issue.
Can you please check the zio buffers and the arc status ?

Here is how you can do it :
- Start mdb : ie. mdb -k

::kmem_cache
- In the output generated above check the amount consumed by the zio_buf_*, arc_buf_t and
arc_buf_hdr_t.

ADDR             NAME                      FLAG  CFLAG  BUFSIZE  BUFTOTL

0000030002640a08 zio_buf_512               0000 020000      512   102675
0000030002640c88 zio_buf_1024              0200 020000     1024       48
0000030002640f08 zio_buf_1536              0200 020000     1536       70
0000030002641188 zio_buf_2048              0200 020000     2048       16
0000030002641408 zio_buf_2560              0200 020000     2560        9
0000030002641688 zio_buf_3072              0200 020000     3072       16
0000030002641908 zio_buf_3584              0200 020000     3584       18
0000030002641b88 zio_buf_4096              0200 020000     4096       12
0000030002668008 zio_buf_5120              0200 020000     5120       32
0000030002668288 zio_buf_6144              0200 020000     6144        8
0000030002668508 zio_buf_7168              0200 020000     7168     1032
0000030002668788 zio_buf_8192              0200 020000     8192        8
0000030002668a08 zio_buf_10240             0200 020000    10240        8
0000030002668c88 zio_buf_12288             0200 020000    12288        4
0000030002668f08 zio_buf_14336             0200 020000    14336      468
0000030002669188 zio_buf_16384             0200 020000    16384     3326
0000030002669408 zio_buf_20480             0200 020000    20480       16
0000030002669688 zio_buf_24576             0200 020000    24576        3
0000030002669908 zio_buf_28672             0200 020000    28672       12
0000030002669b88 zio_buf_32768             0200 020000    32768     1935
000003000266c008 zio_buf_40960             0200 020000    40960       13
000003000266c288 zio_buf_49152             0200 020000    49152        9
000003000266c508 zio_buf_57344             0200 020000    57344        7
000003000266c788 zio_buf_65536             0200 020000    65536     3272
000003000266ca08 zio_buf_73728             0200 020000    73728       10
000003000266cc88 zio_buf_81920             0200 020000    81920        7
000003000266cf08 zio_buf_90112             0200 020000    90112        5
000003000266d188 zio_buf_98304             0200 020000    98304        7
000003000266d408 zio_buf_106496            0200 020000   106496       12
000003000266d688 zio_buf_114688            0200 020000   114688        6
000003000266d908 zio_buf_122880            0200 020000   122880        5
000003000266db88 zio_buf_131072            0200 020000   131072       92

0000030002670508 arc_buf_hdr_t             0000 000000      128    11970
0000030002670788 arc_buf_t                 0000 000000       40     7308

- Dump the values of arc

arc::print struct arc

arc::print struct arc
{
   anon = ARC_anon
   mru = ARC_mru
   mru_ghost = ARC_mru_ghost
   mfu = ARC_mfu
   mfu_ghost = ARC_mfu_ghost
   size = 0x6f7a400
   p = 0x5d9bd5a
   c = 0x5f6375a
   c_min = 0x4000000
   c_max = 0x2e82a000
   hits = 0x40e0a15
   misses = 0x1cec4a4
   deleted = 0x1b0ba0d
   skipped = 0x24ea64e13
   hash_elements = 0x179d
   hash_elements_max = 0x60bb
   hash_collisions = 0x8dca3a
   hash_chains = 0x391
   hash_chain_max = 0x8
   no_grow = 0x1
}

So, about 100MB and a memory crunch..
Interesting ! So, it is not the ARC which is consuming too much memory....
It is some other piece (not sure if it belongs to ZFS) which is causing the crunch...

Or the other possibility is that ARC ate up too much and caused a near crunch situation and the kmem hit back and caused ARC to free up it's buffers (hence the no_grow flag enabled). So, it (ARC) could be osscillating between large caching and then purging the caches.

You might want to keep track of these values (ARC size and no_grow flag) and see how they
change over a period of time. This would help us understand the pattern.
And if we know it ARC which is causing the crunch we could manually change the values of c_max to a comfortable value and that would limit the size of ARC. However, I would suggest
that you try it out on a non-production machine first.

By, default the c_max is set to 75% of physmem and that is the hard limit. "c" is the soft limit and ARC would try and grow upto 'c". The value of "c" is adjusted when there is a need to cache more
but, it will never exceed "c_max".

Regarding the huge number of reads, I am sure you have already tried disabling the VDEV prefetch.
If not, it is worth a try.

Thanks and regards,
Sanjeev.

--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel: x27521 +91 80 669 27521
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to