Performance-wise, I think you should go for mirrors/raid10, and
separate the pools (i.e. rpool mirror on SSD and data mirror on
HDDs). If you have 4 SSDs, you might mirror the other couple for
zoneroots or some databases in datasets delegated into zones,
for example. Don't use dedup. Carve out some space for L2ARC.
As Ed noted, you might not want to dedicate much disk space due
to remaining RAM pressure when using the cache; however, spreading
the IO load between smaller cache partitions/slices on each SSD
may help your IOPS on average. Maybe go for compression.

I really hope someone better versed in compression - like Saso -
would chime in to say whether gzip-9 vs. lzjb (or lz4) sucks in
terms of read-speeds from the pools. My HDD-based assumption is
in general that the less data you read (or write) on platters -
the better, and the spare CPU cycles can usually take the hit.

I'd spread out the different data types (i.e. WORM programs,
WORM-append logs and random-io other application data) into
various datasets with different settings, backed by different
storage - since you have the luxury.

Many best practice documents (and original Sol10/SXCE/LiveUpgrade
requirements) place the zoneroots on the same rpool so they can
be upgraded seamlessly as part of the OS image. However you can
also delegate ZFS datasets into zones and/or have lofs mounts
from GZ to LZ (maybe needed for shared datasets like distros
and homes - and faster/more robust than NFS from GZ to LZ).
For OS images (zoneroots) I'd use gzip-9 or better (likely lz4
when it gets integrated), same for logfile datasets, and lzjb,
zle or none for the random-io datasets. For structured things
like databases I also research the block IO size and use that
(at dataset creation time) to reduce extra work with ZFS COW
during writes - at expense of more metadata.

You'll likely benefit from having OS images on SSDs, logs on
HDDs (including logs from the GZ and LZ OSes, to reduce needless
writes on the SSDs), and databases on SSDs. Things "depend" for
other data types, and in general would be helped by L2ARC on
the SSDs.

Also note that much of the default OS image is not really used
(i.e. X11 on headless boxes), so you might want to do weird
things with GZ or LZ rootfs data layouts - note that these might
puzzle your beadm/liveupgrade software, so you'll have to do
any upgrades with lots of manual labor :)

On a somewhat orthogonal route, I'd start with setting up a
generic "dummy" zone, perhaps with much "unneeded" software,
and zfs-cloning that to spawn application zones. This way
you only pay the footprint price once, at least until you
have to upgrade the LZ OSes - in that case it might be cheaper
(in terms of storage at least) to upgrade the dummy, clone it
again, and port the LZ's customizations (installed software)
by finding the differences between the old dummy and current
zone state (zfs diff, rsync -cn, etc.) In such upgrades you're
really well served by storing volatile data in separate datasets
from the zone OS root - you just reattach these datasets to the
upgraded OS image and go on serving.

As a particular example of the thing often upgraded and taking
considerable disk space per copy - I'd have the current JDK
installed in GZ: either simply lofs-mounted from GZ to LZs,
or in a separate dataset, cloned and delegated into LZs (if
JDK customizations are further needed by some - but not all -
local zones, i.e. timezone updates, trusted CA certs, etc.).

HTH,
//Jim Klimov
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to