Suppose I have a storage server that runs ZFS, presumably providing
file (NFS) and/or block (iSCSI, FC) services to other machines that
are running Solaris.  Some of the use will be for LDoms and zones[1],
which would create zpools on top of zfs (fs or zvol).  I have concerns
about variable block sizes and the implications for performance.

1. http://hub.opensolaris.org/bin/view/Community+Group+zones/zoss

Suppose that on the storage server, an NFS shared dataset is created
without tuning the block size.  This implies that when the client
(ldom or zone v12n server) runs mkfile or similar to create the
backing store for a vdisk or a zpool, the file on the storage server
will be created with 128K blocks.  Then when Solaris or OpenSolaris is
installed into the vdisk or zpool, files of a wide variety of sizes
will be created.  At this layer they will be created with variable
block sizes (512B to 128K).

The implications for a 512 byte write in the upper level zpool (inside
a zone or ldom) seems to be:

- The 512 byte write turns into a 128 KB write at the storage server
  (256x multiplication in write size).
- To write that 128 KB block, the rest of the block needs to be read
  to recalculate the checksum.  That is, a read/modify/write process
  is forced.  (Less impact if block already in ARC.)
- Deduplicaiton is likely to be less effective because it is unlikely
  that the same combination of small blocks in different zones/ldoms
  will be packed into the same 128 KB block.

Alternatively, the block size could be forced to something smaller at
the storage server.  Setting it to 512 bytes could eliminate the
read/modify/write cycle, but would presumably be less efficient (less
performant) with moderate to large files.  Setting it somewhere in
between may be desirable as well, but it is not clear where.  The key
competition in this area seems to have a fixed 4 KB block size.

Questions:

Are my basic assumptions about a given file consisting only of a
single sized block, except for perhaps the final block?

Has any work been done to identify the performance characteristics in
this area?

Is there less to be concerned about from a performance standpoint if
the workload is primarily read?

To maximize the efficacy of dedup, would it be best to pick a fixed
block size and match it between the layers of zfs?

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to