I think this is a systems engineering problem, not just a ZFS problem.
Few have bothered to look at mount performance in the past because
most systems have only a few mounted file systems[1].  Since ZFS does
file system quotas instead of user quotas, now we have the situation
where there could be thousands of mounts.  Now we do need to look at
mount performance more closely.  We're doing some of that work now, and
looking at other possible solutions (CR6478980).

[1] we've done some characterization of this while benchmarking Sun
Cluster failovers. The time required for a UFS mount can be quite
substantial, even when fsck is not required, and is also somewhat
variable (from few seconds to tens of seconds).  We've made some minor
changes to help improve cluster failover wrt mounts, so perhaps we
can look at our characterization data again and see if there is some
low-hanging fruit which would also apply more generally.
 -- richard

Kory Wheatley wrote:
Currently we are trying to setup zfs as file systems for all our user accounts under /homea /homec /homef /homei /homem /homep /homes and /homet. Right now on our Sun Fire v890 with 4 dual core processors and 16gb of memory we have 12,000 zfs file systems setup. Which Sun has promised will work, but we didn't know that it would take over an hour to do a reboot on this machine to mount and umount all these file systems. What were trying to accomplish is the best performance along with best data protection. Sun speaks that ZFS supports millions of fil e systems, but what they left out is how long it takes to do a reboot when you have thousand's of file systems. Currently we have three LUNS on our EMC disk array that we've created one zfs storage pool, and we've created these 12,000 zfs file system to this zfs pool.

We really don't want to have to go ufs to create our user student accounts. We like the flexibility of ZFS, but with the slow boot process it will kill us when we have to implement patches that require a reboot. These ZFS file systems will contain all the student data, so reliability and performance is a key to us. Do you know away or a different setup for ZFS to allow our system to boot up faster? I know each mount takes up memory so that's part of the slowness when mounting and umounting. We know when the system is up that the kernel is using 3gb of memory out of the 16gb, and there's nothing else on this box right, but ZFS. There's no data in those thousand's of file systems yet.

Richard Elling wrote:
Jim Mauro wrote:
(I'm probably not the best person to answer this, but that has never stopped me before, and I need to give Richard Elling a little more time to get the Goats, Cows
and Horses fed, sip his morning coffee, and offer a proper response...)

chores are done, wading through the morning e-mail...

Would it benefit us to have the disk be setup as a raidz along with the hardware raid 5 that is already setup too?
Way back when, we called such configurations "plaiding", which described a host-based RAID configuration that criss-crossed hardware RAID LUNs. In doing such things, we had potentially better data availability with a configuration that could survive more failure modes. Alternatively, we used the hardware RAID for the availability configuration (hardware RAID 5), and used host-based RAID to stripe across hardware
RAID5 LUNs for performance. Seemed to work pretty well.

Yep, there are various ways to do this and, in general, the more copies
of the data you have, the better reliability you have.  Space is also
fairly easy to calculate.  Performance can be tricky, and you may need to
benchmark with your workload to see which is better, due to the difficulty
in modeling such systems.

In theory, a raidz pool spread across some number of underlying hardware raid 5 LUNs would offer protection against more failure mode, such as the loss of an entire raid5 LUN. So from a failure protection/data availability point of view, it offers some benefit. Now, as to whether or not you experience a real, measurable benefit over time is hard to say. Each additional level of protection/redundancy has a diminishing return, often times at a dramatic incremental cost (e.g. getting from "four nines" to "five nines").

If money was no issue, I'm sure we could come up with an awesome solution :-)

Or with this double raid slow our performance with both a software and hardware raid setup?
You will certainly pay a performance - using raidz across the raid5 luns will reduce deliverable IOPS from the raid 5 luns. Whether or not the performance trade-off is worth the RAS gain varies based on
your RAS and data availability requirements.

Fast, inexpensive, reliable: pick two.

Or would raidz setup be better than the hardware raid5 setup?
Assuming a robust raid5 implementation with battery-backed nvram (protect against the "write hole" and partial stripe writes), I think a raidz zpool covers more of the datapath then a hardware raid 5 LUN, but
I'll wait for Richard to elaborate here (or tell me I'm wrong).

In general, you want the data protection in the application, or as close to the application as you can get. Since programmers tend to be lazy (Gosling said it, not me! :-) most rely on the file system and underlying constructs to ensure data protection. So, having ZFS manage the data protection will
always be better than having some box at the other end of a wire managing
the protection.

Also if we do set the disks as a raidz would it benefit use more if we specified each disks in the raidz or create them as Luns then specify the setup in raidz.
Isn't' this the same question as the first question? I'm not sure what you're asking here...

The questions you're asking are good ones, and date back to the decades old struggle
around configuration tradeoffs for performance / availability / cost.

My knee-jerk reaction is that one level of RAID, like either hardware raid5 ZFS raidz is sufficient for availability, and keeps things relatively simple (and simple also improves RAS). The advantage host-based RAID has always had of hardware RAID is the ability to create software LUNs (like a raidz1 or raidz2 zpool) across physical disk controllers, which may also cross SAN switches, etc. So, twas me, I'd go with non-hardware RAID5 devices from the storage frame,
and create raidz1 or raidz2 zpools across controllers.

This is reasonable.

But, that's me...
:^)

/jim

The important thing is to protect your data. You have lots of options here, so we'd need to know more precisely what the other requirements are before
we could give better advice.
 -- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to