I am still trying to determine why Solaris 10 (Generic_141415-03) ZFS performs so terribly on my system. I blew a good bit of personal life savings on this set-up but am not seeing performance anywhere near what is expected. Testing with iozone shows that bulk I/O performance is good. Testing with Jeff Bonwick's 'diskqual.sh' shows expected disk performance. The problem is that actual observed application performance sucks, and could often be satisified by portable USB drives rather than high-end SAS drives. It could be satisified by just one SAS disk drive. Behavior is as if zfs is very slow to read data since disks are read at only 2 or 3 MB/second followed by an intermittent write on a long cycle. Drive lights blink slowly. It is as if ZFS does no successful sequential read-ahead on the files (see Prefetch Data hit rate of 0% and Prefetch Data cache miss of 60% below), or there is a semaphore bottleneck somewhere (but CPU use is very low).

Observed behavior is very program dependent.

# zpool status Sun_2540
  pool: Sun_2540
 state: ONLINE
status: The pool is formatted using an older on-disk format.  The pool can
        still be used, but some features are unavailable.
action: Upgrade the pool using 'zpool upgrade'.  Once this is done, the
        pool will no longer be accessible on older software versions.
 scrub: scrub completed after 0h46m with 0 errors on Mon Jun 29 05:06:33 2009
config:

        NAME                                       STATE     READ WRITE CKSUM
        Sun_2540                                   ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600A0B80003A8A0B0000096A47B4559Ed0  ONLINE       0     0     0
            c4t600A0B800039C9B500000AA047B4529Bd0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600A0B80003A8A0B0000096E47B456DAd0  ONLINE       0     0     0
            c4t600A0B800039C9B500000AA447B4544Fd0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600A0B80003A8A0B0000096147B451BEd0  ONLINE       0     0     0
            c4t600A0B800039C9B500000AA847B45605d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600A0B80003A8A0B0000096647B453CEd0  ONLINE       0     0     0
            c4t600A0B800039C9B500000AAC47B45739d0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600A0B80003A8A0B0000097347B457D4d0  ONLINE       0     0     0
            c4t600A0B800039C9B500000AB047B457ADd0  ONLINE       0     0     0
          mirror                                   ONLINE       0     0     0
            c4t600A0B800039C9B500000A9C47B4522Dd0  ONLINE       0     0     0
            c4t600A0B800039C9B500000AB447B4595Fd0  ONLINE       0     0     0

errors: No known data errors

% ./diskqual.sh
c1t0d0 130 MB/sec
c1t1d0 130 MB/sec
c2t202400A0B83A8A0Bd31 13422 MB/sec
c3t202500A0B83A8A0Bd31 13422 MB/sec
c4t600A0B80003A8A0B0000096A47B4559Ed0 191 MB/sec
c4t600A0B80003A8A0B0000096E47B456DAd0 192 MB/sec
c4t600A0B80003A8A0B0000096147B451BEd0 192 MB/sec
c4t600A0B80003A8A0B0000096647B453CEd0 192 MB/sec
c4t600A0B80003A8A0B0000097347B457D4d0 212 MB/sec
c4t600A0B800039C9B500000A9C47B4522Dd0 191 MB/sec
c4t600A0B800039C9B500000AA047B4529Bd0 192 MB/sec
c4t600A0B800039C9B500000AA447B4544Fd0 192 MB/sec
c4t600A0B800039C9B500000AA847B45605d0 191 MB/sec
c4t600A0B800039C9B500000AAC47B45739d0 191 MB/sec
c4t600A0B800039C9B500000AB047B457ADd0 191 MB/sec
c4t600A0B800039C9B500000AB447B4595Fd0 191 MB/sec

% arc_summary.pl

System Memory:
         Physical RAM:  20470 MB
         Free Memory :  2371 MB
         LotsFree:      312 MB

ZFS Tunables (/etc/system):
         * set zfs:zfs_arc_max = 0x300000000
         set zfs:zfs_arc_max = 0x280000000
         * set zfs:zfs_arc_max = 0x200000000

ARC Size:
         Current Size:             9383 MB (arcsize)
         Target Size (Adaptive):   10240 MB (c)
         Min Size (Hard Limit):    1280 MB (zfs_arc_min)
         Max Size (Hard Limit):    10240 MB (zfs_arc_max)

ARC Size Breakdown:
         Most Recently Used Cache Size:           6%    644 MB (p)
         Most Frequently Used Cache Size:        93%    9595 MB (c-p)

ARC Efficency:
         Cache Access Total:             674638362
         Cache Hit Ratio:      91%       615586988      [Defined State for 
buffer]
         Cache Miss Ratio:      8%       59051374       [Undefined State for 
Buffer]
         REAL Hit Ratio:       87%       590314508      [MRU/MFU Hits Only]

         Data Demand   Efficiency:    96%
         Data Prefetch Efficiency:     7%

        CACHE HITS BY CACHE LIST:
          Anon:                        2%        13626529               [ New 
Customer, First Cache Hit ]
          Most Recently Used:         78%        480379752 (mru)        [ 
Return Customer ]
          Most Frequently Used:       17%        109934756 (mfu)        [ 
Frequent Customer ]
          Most Recently Used Ghost:    0%        5180256 (mru_ghost)    [ 
Return Customer Evicted, Now Back ]
          Most Frequently Used Ghost:  1%        6465695 (mfu_ghost)    [ 
Frequent Customer Evicted, Now Back ]
        CACHE HITS BY DATA TYPE:
          Demand Data:                78%        485431759
          Prefetch Data:               0%        3045442
          Demand Metadata:            16%        103900170
          Prefetch Metadata:           3%        23209617
        CACHE MISSES BY DATA TYPE:
          Demand Data:                30%        18109355
          Prefetch Data:              60%        35633374
          Demand Metadata:             6%        3806177
Prefetch Metadata: 2% 1502468 ---------------------------------------------

Prefetch seems to be performing badly. The Ben Rockwood's blog entry at http://www.cuddletech.com/blog/pivot/entry.php?id=1040 discusses prefetch. The sample Dtrace script on that page only shows cache misses:

vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774849536: 
MISS
vdev_cache_read: 6507827833451031357 read 131072 bytes at offset 6774980608: 
MISS

Unfortunately, the file-level prefetch DTrace sample script from the same page seems to have a syntax error.

I tried disabling file level prefetch (zfs_prefetch_disable=1) but did not observe any change in behavior.

# kstat -p zfs:0:vdev_cache_stats
zfs:0:vdev_cache_stats:class    misc
zfs:0:vdev_cache_stats:crtime   130.61298275
zfs:0:vdev_cache_stats:delegations      754287
zfs:0:vdev_cache_stats:hits     3973496
zfs:0:vdev_cache_stats:misses   2154959
zfs:0:vdev_cache_stats:snaptime 451955.55419545

Performance when coping 236 GB of files (each file is 5537792 bytes, with 20001 files per directory) from one directory to another:

Copy Method                             Data Rate
====================================    ==================
cpio -pdum                              75 MB/s
cp -r                                   32 MB/s
tar -cf - . | (cd dest && tar -xf -)    26 MB/s

I would expect data copy rates approaching 200 MB/s.

I have not seen a peep from a zfs developer on this list for a month or two. It would be useful if they would turn up to explain possible causes for this level of performance. If I am encountering this problem, then it is likely that many others are as well.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to