I think this is a systems engineering problem, not just a ZFS problem.
Few have bothered to look at mount performance in the past because
most systems have only a few mounted file systems[1]. Since ZFS does
file system quotas instead of user quotas, now we have the situation
where there could be thousands of mounts. Now we do need to look at
mount performance more closely. We're doing some of that work now, and
looking at other possible solutions (CR6478980).
[1] we've done some characterization of this while benchmarking Sun
Cluster failovers. The time required for a UFS mount can be quite
substantial, even when fsck is not required, and is also somewhat
variable (from few seconds to tens of seconds). We've made some minor
changes to help improve cluster failover wrt mounts, so perhaps we
can look at our characterization data again and see if there is some
low-hanging fruit which would also apply more generally.
-- richard
Kory Wheatley wrote:
Currently we are trying to setup zfs as file systems for all our user
accounts under /homea /homec /homef /homei /homem /homep /homes and
/homet. Right now on our Sun Fire v890 with 4 dual core processors and
16gb of memory we have 12,000 zfs file systems setup. Which Sun has
promised will work, but we didn't know that it would take over an hour
to do a reboot on this machine to mount and umount all these file
systems. What were trying to accomplish is the best performance along
with best data protection. Sun speaks that ZFS supports millions of fil
e systems, but what they left out is how long it takes to do a reboot
when you have thousand's of file systems.
Currently we have three LUNS on our EMC disk array that we've created
one zfs storage pool, and we've created these 12,000 zfs file system to
this zfs pool.
We really don't want to have to go ufs to create our user student
accounts. We like the flexibility of ZFS, but with the slow boot
process it will kill us when we have to implement patches that require a
reboot. These ZFS file systems will contain all the student data, so
reliability and performance is a key to us. Do you know away or a
different setup for ZFS to allow our system to boot up faster?
I know each mount takes up memory so that's part of the slowness when
mounting and umounting. We know when the system is up that the kernel
is using 3gb of memory out of the 16gb, and there's nothing else on this
box right, but ZFS. There's no data in those thousand's of file systems
yet.
Richard Elling wrote:
Jim Mauro wrote:
(I'm probably not the best person to answer this, but that has never
stopped me
before, and I need to give Richard Elling a little more time to get
the Goats, Cows
and Horses fed, sip his morning coffee, and offer a proper response...)
chores are done, wading through the morning e-mail...
Would it benefit us to have the disk be setup as a raidz along with
the hardware raid 5 that is already setup too?
Way back when, we called such configurations "plaiding", which
described a host-based RAID configuration
that criss-crossed hardware RAID LUNs. In doing such things, we had
potentially better data availability
with a configuration that could survive more failure modes.
Alternatively, we used the hardware RAID
for the availability configuration (hardware RAID 5), and used
host-based RAID to stripe across hardware
RAID5 LUNs for performance. Seemed to work pretty well.
Yep, there are various ways to do this and, in general, the more copies
of the data you have, the better reliability you have. Space is also
fairly easy to calculate. Performance can be tricky, and you may need to
benchmark with your workload to see which is better, due to the
difficulty
in modeling such systems.
In theory, a raidz pool spread across some number of underlying
hardware raid 5 LUNs would
offer protection against more failure mode, such as the loss of an
entire raid5 LUN. So from
a failure protection/data availability point of view, it offers some
benefit. Now, as to whether or not
you experience a real, measurable benefit over time is hard to say.
Each additional level of protection/redundancy
has a diminishing return, often times at a dramatic incremental cost
(e.g. getting from "four nines" to "five nines").
If money was no issue, I'm sure we could come up with an awesome
solution :-)
Or with this double raid slow our performance with both a software
and hardware raid setup?
You will certainly pay a performance - using raidz across the raid5
luns will reduce deliverable IOPS
from the raid 5 luns. Whether or not the performance trade-off is
worth the RAS gain varies based on
your RAS and data availability requirements.
Fast, inexpensive, reliable: pick two.
Or would raidz setup be better than the hardware raid5 setup?
Assuming a robust raid5 implementation with battery-backed nvram
(protect against the "write hole" and
partial stripe writes), I think a raidz zpool covers more of the
datapath then a hardware raid 5 LUN, but
I'll wait for Richard to elaborate here (or tell me I'm wrong).
In general, you want the data protection in the application, or as
close to
the application as you can get. Since programmers tend to be lazy
(Gosling
said it, not me! :-) most rely on the file system and underlying
constructs
to ensure data protection. So, having ZFS manage the data protection
will
always be better than having some box at the other end of a wire managing
the protection.
Also if we do set the disks as a raidz would it benefit use more if
we specified each disks in the raidz or create them as Luns then
specify the setup in raidz.
Isn't' this the same question as the first question? I'm not sure
what you're asking here...
The questions you're asking are good ones, and date back to the
decades old struggle
around configuration tradeoffs for performance / availability / cost.
My knee-jerk reaction is that one level of RAID, like either hardware
raid5 ZFS raidz is sufficient
for availability, and keeps things relatively simple (and simple also
improves RAS). The advantage
host-based RAID has always had of hardware RAID is the ability to
create software LUNs
(like a raidz1 or raidz2 zpool) across physical disk controllers,
which may also cross SAN
switches, etc. So, twas me, I'd go with non-hardware RAID5 devices
from the storage frame,
and create raidz1 or raidz2 zpools across controllers.
This is reasonable.
But, that's me...
:^)
/jim
The important thing is to protect your data. You have lots of options
here,
so we'd need to know more precisely what the other requirements are
before
we could give better advice.
-- richard
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss