On Jul 4, 2009, at 12:03 AM, Bob Friesenhahn wrote:

% ./diskqual.sh
c1t0d0 130 MB/sec
c1t1d0 130 MB/sec
c2t202400A0B83A8A0Bd31 13422 MB/sec
c3t202500A0B83A8A0Bd31 13422 MB/sec
c4t600A0B80003A8A0B0000096A47B4559Ed0 191 MB/sec
c4t600A0B80003A8A0B0000096E47B456DAd0 192 MB/sec
c4t600A0B80003A8A0B0000096147B451BEd0 192 MB/sec
c4t600A0B80003A8A0B0000096647B453CEd0 192 MB/sec
c4t600A0B80003A8A0B0000097347B457D4d0 212 MB/sec
c4t600A0B800039C9B500000A9C47B4522Dd0 191 MB/sec
c4t600A0B800039C9B500000AA047B4529Bd0 192 MB/sec
c4t600A0B800039C9B500000AA447B4544Fd0 192 MB/sec
c4t600A0B800039C9B500000AA847B45605d0 191 MB/sec
c4t600A0B800039C9B500000AAC47B45739d0 191 MB/sec
c4t600A0B800039C9B500000AB047B457ADd0 191 MB/sec
c4t600A0B800039C9B500000AB447B4595Fd0 191 MB/sec

somehow i don't think that reading the first 64MB off (presumably) off a raw disk device 3 times and picking the middle value is going to give you much useful information on the overall state of the disks .. i believe this was more of a quick hack to just validate that there's nothing too far out of the norm, but with that said - what's the c2 and c3 device above? you've got to be caching the heck out of that to get that unbelievable 13 GB/s - so you're really only seeing memory speeds there

more useful information would be something more like the old taz or some of the disk IO latency tools when you're driving a workload.

% arc_summary.pl

System Memory:
         Physical RAM:  20470 MB
         Free Memory :  2371 MB
         LotsFree:      312 MB

ZFS Tunables (/etc/system):
         * set zfs:zfs_arc_max = 0x300000000
         set zfs:zfs_arc_max = 0x280000000
         * set zfs:zfs_arc_max = 0x200000000

ARC Size:
         Current Size:             9383 MB (arcsize)
         Target Size (Adaptive):   10240 MB (c)
         Min Size (Hard Limit):    1280 MB (zfs_arc_min)
         Max Size (Hard Limit):    10240 MB (zfs_arc_max)

ARC Size Breakdown:
         Most Recently Used Cache Size:           6%    644 MB (p)
         Most Frequently Used Cache Size:        93%    9595 MB (c-p)

ARC Efficency:
         Cache Access Total:             674638362
         Cache Hit Ratio:      91%       615586988      [Defined State for 
buffer]
         Cache Miss Ratio:      8%       59051374       [Undefined State for 
Buffer]
         REAL Hit Ratio:       87%       590314508      [MRU/MFU Hits Only]

         Data Demand   Efficiency:    96%
         Data Prefetch Efficiency:     7%

        CACHE HITS BY CACHE LIST:
Anon: 2% 13626529 [ New Customer, First Cache Hit ] Most Recently Used: 78% 480379752 (mru) [ Return Customer ] Most Frequently Used: 17% 109934756 (mfu) [ Frequent Customer ] Most Recently Used Ghost: 0% 5180256 (mru_ghost) [ Return Customer Evicted, Now Back ] Most Frequently Used Ghost: 1% 6465695 (mfu_ghost) [ Frequent Customer Evicted, Now Back ]
        CACHE HITS BY DATA TYPE:
          Demand Data:                78%        485431759
          Prefetch Data:               0%        3045442
          Demand Metadata:            16%        103900170
          Prefetch Metadata:           3%        23209617
        CACHE MISSES BY DATA TYPE:
          Demand Data:                30%        18109355
          Prefetch Data:              60%        35633374
          Demand Metadata:             6%        3806177
Prefetch Metadata: 2% 1502468 ---------------------------------------------

Prefetch seems to be performing badly. The Ben Rockwood's blog entry at http://www.cuddletech.com/blog/pivot/entry.php?id=1040 discusses prefetch. The sample Dtrace script on that page only shows cache misses:

vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774849536: MISS vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774980608: MISS

Unfortunately, the file-level prefetch DTrace sample script from the same page seems to have a syntax error.

if you're using LUNs off an array - this might be another case of the zfs_vdev_max_pending being tuned more for direct attach drives .. you could be trying to queue up too much I/O against the RAID controller, particularly if the RAID controller is also trying to prefetch out of it's cache.

I tried disabling file level prefetch (zfs_prefetch_disable=1) but did not observe any change in behavior.

this is only going to help if you've got problems in zfetch .. you'd probably see this better by looking for high lock contention in zfetch with lockstat

# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:class    misc
zfs:0:vdev_cache_stats:crtime   130.61298275
zfs:0:vdev_cache_stats:delegations      754287
zfs:0:vdev_cache_stats:hits     3973496
zfs:0:vdev_cache_stats:misses   2154959
zfs:0:vdev_cache_stats:snaptime 451955.55419545

Performance when coping 236 GB of files (each file is 5537792 bytes, with 20001 files per directory) from one directory to another:

Copy Method                             Data Rate
====================================    ==================
cpio -pdum                              75 MB/s
cp -r                                   32 MB/s
tar -cf - . | (cd dest && tar -xf -)    26 MB/s

I would expect data copy rates approaching 200 MB/s.


you might want to dtrace this to break down where the latency is occuring .. eg: is this a DNLC caching problem, ARC problem, or device level problem

also - is this really coming off a 2540? if so - you should probably investigate the array throughput numbers and what's happening on the RAID controller .. i typically find it helpful to understand what the raw hardware is capable of (hence tools like vdbench to drive an anticipated load before i configure anything) - and then attempting to configure the various tunables to match after that

for now you're pretty much just at the FS/VOP layers and playing with caching when the real culprit might be more on the vdev interface layer or below

---
.je
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to