Hi all

This is just an off-the-cuff idea at the moment, but I would like to sound
it out.

Consider the situation where someone has a large amount of off-site data
storage (of the order of 100s of TB or more). They have a slow network link
to this storage.

My idea is that this could be used to build the main vdevs for a ZFS pool.
On top of this, an array of disks (of the order of TBs to 10s of TB) is
available locally, which can be used as L2ARC. There are also smaller,
faster arrays (of the order of 100s of GB) which, in my mind, could be used
as a ZIL.

Now, in this theoretical situation, in-play read data is kept on the L2ARC,
and can be accessed about as fast as if this array was just used as the main
pool vdevs. Written data goes to the ZIL, as is then sent down the slow link
to the offsite storage. Rarely used data is still available as if on site
(shows up in the same file structure), but is effectively "archived" to the
offsite storage.

Now, here comes the problem. According to what I have read, the maximum size
for the ZIL is approx 50% of the physical memory in the system, which would
be too small for this particular situation. Also, you cannot mirror the
L2ARC, which would have dire performance consequences in the case of a disk
failure in the L2ARC. I also believe (correct me if I am wrong) that the
L2ARC is invalidated on reboot, so would have to "warm up" again). And
finally, if the network link was to die, I am assuming the entire ZPool
would become unavailable.

This is a setup which I can see many use cases for, but it introduces too
many failure modes.

What I would like to see is an extension to ZFS's hierarchical storage
environment, such that an additional layer can be put behind the main pool
vdevs as an "archive" store (i.e. it goes
[ARC]->[L2ARC/ZIL]->[main]->[archive]). Infrequently used files/blocks could
be pushed into this storage, but appear to be available as normal. It would,
for example, allow old snapshot data to be pushed down, as this is very
rarely going to be used, or files which must be archived for legal reasons.
It would also utilise the bandwidth available more efficiently, as only data
being specifically sent to it would need transferring.

In the case where the archive storage becomes unavailable, there would be a
number of possible actions (e.g. error on access, block on access, make the
files "disappear" temporarily).

I know there are already solutions out there which do similar jobs. The
company I work for use one which pushes "archive" data to a tape stacker,
and pulls it back when accessed. But I think this is a ripe candidate for
becoming part of the ZFS stack.

So, what does everyone think?

Rgds
Karl

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to