Re[9]: [zfs-discuss] Re: Re: Due to 128KB limit in ZFS it can't saturate disks

Robert Milkowski Tue, 23 May 2006 14:20:13 -0700

Hello Roch,

Monday, May 22, 2006, 3:42:41 PM, you wrote:




RBPE>   Robert Says:

RBPE>   Just to be sure - you did reconfigure system to actually allow larger
RBPE>   IO sizes?


RBPE> Sure enough, I messed up (I had no tuning to get the above data); So
RBPE> 1 MB was my max transfer sizes. Using 8MB I now see:

RBPE>    Bytes   Elapse of phys IO     Size       
RBPE>    Sent
RBPE>    
RBPE>    8 MB;   3576 ms of phys; avg sz : 16 KB; throughput 2 MB/s
RBPE>    9 MB;   1861 ms of phys; avg sz : 32 KB; throughput 4 MB/s
RBPE>    31 MB;  3450 ms of phys; avg sz : 64 KB; throughput 8 MB/s
RBPE>    78 MB;  4932 ms of phys; avg sz : 128 KB; throughput 15 MB/s
RBPE>    124 MB; 4903 ms of phys; avg sz : 256 KB; throughput 25 MB/s
RBPE>    178 MB; 4868 ms of phys; avg sz : 512 KB; throughput 36 MB/s
RBPE>    226 MB; 4824 ms of phys; avg sz : 1024 KB; throughput 46 MB/s
RBPE>    226 MB; 4816 ms of phys; avg sz : 2048 KB; throughput 54 MB/s (was 46 
MB/s)
RBPE>     32 MB;  686 ms of phys; avg sz : 4096 KB; throughput 58 MB/s (was 46 
MB/s)
RBPE>    224 MB; 4741 ms of phys; avg sz : 8192 KB; throughput 59 MB/s (was 47 
MB/s)
RBPE>    272 MB; 4336 ms of phys; avg sz : 16384 KB; throughput 58 MB/s (new  
data)
RBPE>    288 MB; 4327 ms of phys; avg sz : 32768 KB; throughput 59 MB/s (new 
data)

RBPE> Data  was corrected  after  it was pointed out   that, physio will  be
RBPE> throttled by maxphys. New data was obtained after settings

RBPE>         /etc/system: set maxphys=8338608
RBPE>         /kernel/drv/sd.conf sd_max_xfer_size=0x800000
RBPE>         /kernel/drv/ssd.cond ssd_max_xfer_size=0x800000

RBPE>         And setting un_max_xfer_size in "struct sd_lun".
RBPE>         That address was figured out using dtrace and knowing that
RBPE>         sdmin() calls ddi_get_soft_state (details avail upon request).
RBPE>         
RBPE>         And of course disabling the write cache (using format -e)

RBPE>         With this in place I verified that each sdwrite() up to 8M 
RBPE>         would lead to a single biodone interrupts using this:

RBPE>         dtrace -n 'biodone:entry,sdwrite:[EMAIL PROTECTED], 
stack(20)]=count()}'

RBPE>         Note that for 16M and 32M raw device writes, each default_physio
RBPE>         will issue a series of 8M I/O. And so we don't
RBPE>         expect any more throughput from that.


RBPE> The script  used  to measure  the  rates  (phys.d)  was also
RBPE> modified since I was  counting the bytes  before the I/O had
RBPE> completed and that made a  big difference for the very large
RBPE> I/O sizes.

RBPE> If you take the  8M case, the  above rates correspond to the
RBPE> time it takes to issue  and wait for a  single 8M I/O to the
RBPE> sd driver. So this time certainly does include  1 seek and ~
RBPE> 0.13 seconds  of data transfer, then  the time to respond to
RBPE> the  interrupt, finally the wakeup  of the thread waiting in
RBPE> default_physio(). Given that the data  transfer rate using 4
RBPE> MB is very close to  the one using 8  MB, I'd say that at 60
RBPE> MB/sec all the fixed-cost  element are well amortized.  So I
RBPE> would conclude from this that  the limiting factor is now at
RBPE> the  device itself or  on the data  channel between the disk
RBPE> and the host.


RBPE> Now recall that the throughput that ZFS gets during an
RBPE> spa_sync when submitted to a single dd and knowing that ZFS
RBPE> will work with 128K I/O:

RBPE>    1431 MB; 23723 ms of spa_sync; avg sz : 127 KB; throughput 60 MB/s
RBPE>    1387 MB; 23044 ms of spa_sync; avg sz : 127 KB; throughput 60 MB/s
RBPE>    2680 MB; 44209 ms of spa_sync; avg sz : 127 KB; throughput 60 MB/s
RBPE>    1359 MB; 24223 ms of spa_sync; avg sz : 127 KB; throughput 56 MB/s
RBPE>    1143 MB; 19183 ms of spa_sync; avg sz : 126 KB; throughput 59 MB/s


RBPE> My disk is

RBPE>        <HITACHI-DK32EJ36NSUN36G-PQ08-33.92GB>.

Is it over FC or just SCSI/SAS?

I have to try again with SAS/SCSI - maybe due to more overhead in FC
larger IOs give better results than on SCSI?

-- 
Best regards,
 Robert                            mailto:[EMAIL PROTECTED]
                                       http://milek.blogspot.com

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re[9]: [zfs-discuss] Re: Re: Due to 128KB limit in ZFS it can't saturate disks

Reply via email to