On Sun, 22 Jun 2008, Will Murnane wrote:
>
>> Perhaps the solution is to install more RAM in the system so that the
>> stripe is fully cached and ZFS does not need to go back to disk prior
>> to writing an update.
> I don't think the problem is that the stripe is falling out of cache,
> but that it costs so much to get it into memory in the first place.

That makes sense and is demonstrated by measurements.

The following iozone Kbytes/sec throughput numbers are from a mirrored 
array rather than Raid-Z but it shows how sensitive ZFS becomes to 
block size once cache memory requirements start to exceed available 
memory.  Since throughput is a function of record size and latency 
this presentation tends to amplify the situation.

                                           random  random    bkwd  record  
stride 
reclen   write rewrite    read    reread    read   write    read rewrite    read
      4  367953  143777   496378   488186    6242    2521  836293  786866   
30269
      8  249827  166847   621371   489279   12520    4130  929394 1508139   
41568
     16  273266  160537   555350   513444   24895    6991  928915 2473915   
32016
     32  293463  168727   595128   678359   48666   15831  818962 3708512   
43561
     64  284213  168007   694747   514942   99565   95703  705144 3774777  
270612
    128  273797  271583  1260035  1366050  187042  512312 1175683 4616660  
861089
    256  273265  272916  1259814  1394034  250743  480186  219927 4708927  
587602
    512  260630  262145   713797   743914  313429  535920  343209 2603492  
583120

Clearly random-read and random-write suffers the most.  Since 
sub-block updates cause ZFS to have to read the existing block, the 
random-write performance becomes bottlenecked by the random-read 
performance.  When the write is aligned and a multiple of the ZFS 
block size, then ZFS does not care what is already on disk and writes 
very quickly.  Notice that in the above results, random write became 
much faster than sequential write.

Bob
======================================
Bob Friesenhahn
[EMAIL PROTECTED], http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,    http://www.GraphicsMagick.org/

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to