Erik Trimble wrote:
On 9/22/2010 11:15 AM, Markus Kovero wrote:
Such configuration was known to cause deadlocks. Even if it works
now (which I don't expect to be the case) it will make your data to
be cached twice. The CPU utilization> will also be much higher, etc.
All in all I strongly recommend against such setup.
--
Pawel Jakub Dawidek http://www.wheelsystems.com
p...@freebsd.org http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
Well, CPU utilization can be tuned downwards by disabling checksums
in inner pools as checksumming is done in main pool. I'd be
interested in bug id's for deadlock issues and everything related.
Caching twice is not an issue, prefetching could be and it can be
disabled
I don't understand what makes it difficult for zfs to handle this
kind of setup. Main pool (testpool) should just allow any
writes/reads to/from volume, not caring what they are, where as
anotherpool would just work as any other pool consisting of any other
devices.
This is quite similar setup to iscsi-replicated mirror pool, where
you have redundant pool created from iscsi volumes locally and remotely.
Yours
Markus Kovero
Actually, the mechanics of local pools inside pools is significantly
different than using remote volumes (potentially exported ZFS volumes)
to build a local pool from.
And, no, you WOULDN'T want to turn off the "inside" pool's checksums.
You're assuming that this would be taken care of by the outside pool,
but that's a faulty assumption, since the only way this would happen
would be if the pools somehow understood they were being nested, and
thus could "bypass" much of the caching and I/O infrastructure related
to the inner pool.
What is an example of where a checksummed outside pool would not be able
to protect a non-checksummed inside pool? Would an intermittent
RAM/motherboard/CPU failure that only corrupted the inner pool's block
before it was passed to the outer pool (and did not corrupt the outer
pool's block) be a valid example?
If checksums are desirable in this scenario, then redundancy would also
be needed to recover from checksum failures.
Pools understanding nesting would be a win. Another win that might
benefit from this pool-to-pool communication interface would be a ZFS
client (shim? driver?) that would extend ZFS checksum protection all the
way out across the network to the workstations accessing ZFS pools. ZFS
offers no protection against corruption between the CIFS/NFS server and
the CIFS/NFS client. (The client would need to mount the pool directly
in the current structure).
----
To quote myself from May 2010:
If someone wrote a "ZFS client", it'd be possible to get over the wire
data protection. This would be continuous from the client computer all
the way to the storage device. Right now there is data protection from
the server to the storage device. The best protected apps are those
running on the same server that has mounted the ZFS pool containing the
data they need (in which case they are protected by ZFS checksums and by
ECC RAM, if present).
A "ZFS client" would run on the computer connecting to the ZFS server,
in order to extend ZFS's protection and detection out across the network.
In one model, the ZFS client could be a proxy for communication between
the client and the server running ZFS. It would extend the filesystem
checksumming across the network, verifying checksums locally as data was
requested, and calculating checksums locally before data was sent that
the server would re-check. Recoverable checksum failures would be
transparent except for performance loss, unrecoverable failures would be
reported as unrecoverable using the standard OS unrecoverable checksum
error message (Windows has one that it uses for bad sectors on drives
and optical media). The local client checksum calculations would be
useful in detecting network failures, and local hardware instability.
(I.e. if most/all clients start seeing checksum failures...look at the
network; if only one client sees checksum failures, check that client's
hardware.)
An extension to the ZFS client model would allow multi-level ZFS systems
to better coordinate their protection and recover from more scenarios.
By multi-level ZFS, I mean ZFS stacked on ZFS, say via iSCSI. An
example (I'm sure there are better ones) would be 3 servers, each with 3
data disks. Each disk is made into its own non-redundant pool (making 9
non-redundant pools). These pools are in turn shared via iSCSI. One of
the servers creates RAIDZ1 groups using 1 disk from each of the 3 servers.
With a means for ZFS systems to communicate, a failure of any
non-redundant lower level device need not trigger a system halt of that
lower system, because it will know from the higher level system that the
device can be repaired/replaced using the higher level redundancy.
A key to making this happen is an interface to request a block and its
related checksum (or if speaking of CIFS, to request a file, its related
blocks, and their checksums.)
----
The ability to grow/shrink RAIDZ by adding and removing devices is still
more important, and so is the ability to rebalance pools when a pool is
grown.
Cacheing is also a huge issue, since ZFS isn't known for being
memory-slim, and as caching is done (currently) on a per-pool level,
nested pools will consume significantly more RAM.
This tells me that nesting itself isn't a cause for additional RAM
consumption. The number of pools is the cause. Minimize the number of
pools to minimize RAM consumption.
Without caching the inner pool, performance is going to suck (even if
some blocks are cached in the outer pool, that pool has no way to do
look-ahead, nor other actions). The nature of delayed writes can also
wreck havoc with caching at both pool levels.
What about not caching the outer pool? Then can we view the inner pool
as using a (now larger) cache to make up for a 'big slow storage'
device. The inner pool knows which files are being used so can do
look-ahead.
Stupid filesystems have no issues with nesting, as they're not doing
anything besides (essentially) direct I/O to the underlying devices.
UFS doesn't have its own I/O subsystem, nor do things like ext* or
xfs. However, I've yet to see any "modern" filesystem do well with
nesting itself - there's simply too much going on under the hood, and
without being "nested-aware" (i.e. specifically coding the filesystem
to understand when it's being nested), much of these backend
optimizations are a recipe for conflict .
Sounds like tunneling TCP over TCP, vs TCP over UDP. In the former case
optimizations and retries on errors can lead to quickly degraded
performance. In the latter, the lower layer doesn't try to maintain
integrity and instead leaves that job to the application.
TCP over TCP: ZFS over ZFS
TCP over UDP: ZFS over UFS
UDP over UDP: UFS over UFS
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss