So while I'm feeling optimistic  :-) we really ought to be
  able to do this in two I/O operations. If we have, say, 500K
  of data to write (including all  of the metadata), we should
  be able  to allocate  a contiguous  500K  block on disk  and
  write  that with  a  single  operation.  Then we update  the
  Uberblock. 

    The only inherent   problem preventing this right   now is
  that we don't have  general   scatter/gather at the   driver
  level (ugh).  This is a bug  that should be fixed, IMO. Then
  ZFS just needs  to delay  choosing physical block  locations
  until   they’re being written as   part  of a group.   
  (Of course, as NetApp points out in  their WAFL papers, the goal
  of   optimizing   writes  can conflict  with    the  goal of
  optimizing reads, so taken to  an extreme, this optimization
  isn’t always desirable.)


Hi Anton, Optimistic a little yes.

The data block should have aggregated quite well into near
recordsize I/Os, are you sure they did not ? No O_DSYNC in
here right ?

Once  the data  blocks are  on disk we  have the information
necessary to update the  indirect  blocks iteratively up  to
the  ueberblock. Those  are the  smaller I/Os;  I guess that
because    of ditto blocks  they  go  to physically seperate
locations, by design.

All of these though are normally done asynchronously to
applications, unless the disks are flooded. 

But  I follow  you in that,  It  may be remotely possible to
reduce the number of Iterations  in the process by  assuming
that the I/O will  all succeed, then  if some fails, fix  up
the consequence and when all  done, update the ueberblock. I
would not hold my breath quite yet for that.

-r

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to