Justin wrote:
I have a question about using mixed vdev in the same zpool and what the
community opinion is on the matter. Here is my setup:
I have four 1TB drives and two 500GB drives. When I first setup ZFS I was
under the assumption that it does not really care much on how you add devices
to the pool and it assumes you are thinking things through. But when I tried
to create a pool (called group) with four 1TB disk in raidz and two 500GB disk
in mirror configuration to the same pool ZFS complained and said if I wanted to
do it I had to add a -f (which I assume stands for force). So was ZFS
attempting to stop me from doing something generally considered bad?
Were any of these drives previous part of another pool? As such, ZFS
will usually complain if it finds a signature already on the drive, and
make you use the '-f' option. Otherwise, I don't /think/ it should care
if you are being foolish. <wink>
Some other questions I have, lets assume that this setup isn't that bad (or it
is that bad and these questions will be why):
If one 500GB disk dies (c10dX) in the mirror and I choose not to replace it, would I be able to migrate the files that are on the other mirror that still works over to the drives in the raidz configuration assuming there is space? Would ZFS inform me which files are affected, like it does in other situations?
No, you can't currently. Essentially what you are asking is if you can
remove the mirror from the pool - this is not currently possible, though
I'm hopeful it may happen in the not-so-distant future.
In this configuration how does Solaris/ZFS determine which vdev to place the
current write operations worth of data into?
It will attempt to balance the data across the two vdevs (the mirror and
raidz) until it runs out of space on one (in your case, the mirror
pair). ZFS does not currently understand differences in underlying
hardware performance or vdev layout, so it can't "magically" decide to
write data to one particular vdev over the other. In fact, I can't
really come up with a sane way to approach that problem - there are
simply too many variables to allow for automagic optimization like
that. Perhaps if there was some way to "hint" to ZFS upon pool creation
like "perfer vdev A for large writes, vdev B for small writes", but even
so, I think that's marching off into a wilderness we don't want to
visit, let alone spend any time in.
I would consider this a poor design, as the vdevs have very different
performance profiles, which hurts the overall performance of the pool
significantly.
Is there any situations where data would, for some reason, not be protected against single disk failures?
No. In your config, both vdevs can survive a single disk failure, so the
pool is fine.
Would this configuration survive a two disk failure if the disk are in a separate vdev?
Yes.
jsm...@corax:~# zpool status group
pool: group
state: ONLINE
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
group ONLINE 0 0 0
..raidz1 ONLINE 0 0 0
....c7t0d0 ONLINE 0 0 0
....c7t1d0 ONLINE 0 0 0
....c8t0d0 ONLINE 0 0 0
....c8t1d0 ONLINE 0 0 0
..mirror ONLINE 0 0 0
....c10d0 ONLINE 0 0 0
....c10d1 ONLINE 0 0 0
errors: No known data errors
jsm...@corax:~# zfs list group
NAME USED AVAIL REFER MOUNTPOINT
group 94.4K 3.12T 23.7K /group
This isn't for a production environment in some datacenter but nevertheless I
would like to make the data as reasonably secure as possible while maximizing
total storage space.
If you are using Solaris (which you seem to be doing), my recommendation
is that you use SVM to create a single 1TB concat device from the 2
500GB drives, then use that 1TB concat device along with the other
physical 1TB devices to create your pool with. Failing 1 500GB drive
then invalidates that concat device, which ZFS assumes is a single
"disk", and behaves accordingly.
Thus, my suggestion is something like this:
( using your cX layout in the example above)
# metainit d0 2 1 c10d0s2 1 c10d1s2
# zpool create tank raidz c7t0d0 c7t1d0 c8t0d0 c8t1d0 /dev/md/dsk/d0
This would get you a RAIDZ of capacity 4TB or thereabouts, able to
survive 1 disk failure (or, both 500GB drives failing)
--
Erik Trimble
Java System Support
Mailstop: usca22-123
Phone: x17195
Santa Clara, CA
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss