Anton B. Rang writes:

 > > When you have a striped storage device under a
 > > file system, then the database or file system's view
 > > of contiguous data is not contiguous on the media.
 > 
 > Right.  That's a good reason to use fairly large stripes.  (The
 > primary limiting factor for stripe size is efficient parallel access;
 > using a 100 MB stripe size means that an average 100 MB file gets less
 > than two disks' worth of throughput.) 
 > 
 > ZFS, of course, doesn't have this problem, since it's handling the
 > layout on the media; it can store things as contiguously as it wants. 
 > 


It can  do what it  wants. But currently  what it does is to
maintain files subject to  small random writes contiguous to
the level  of the zfs recordsize.   Now  after a significant
run of  random writes the  files ends  up with  a  scattered
on-disk layout.  This should work  well  for the transaction
parts  of the workload.  But   the implications  of using  a
small  recordsize are that   large sequential scans of files
will make the disk heads  very busy fetching or pre-fetching
recordsized chunks.  Get  more spindles and  a good prefetch
algorithm you  can reach whatever  throughput you need.  The
problem  is   that your  scanning   ops will   create  heavy
competition   at  the  spindle  level  thus    impacting the
transactional response time (once you have 150 IOPS on every
spindle just prefetching data for your full table scans, the
OLTP will suffer). Yes we  do need data to characterise this
but the physics are fairly clear.


The BP suggesting a small recordize needs to be updated.

We need to strike a balance between random writes and
sequential reads which does imply using greater records that 
8K/16K DB blocks.

-r

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to