[zfs-discuss] ZFS and question on repetative data migrating to it efficiently...

jason Fri, 02 Feb 2007 09:23:24 -0800

Hi all,

Longtime reader, first time poster.... Sorry for the lengthy intro and not 
really sure the title matches what I'm trying to get at... I am trying to find 
a solution where making use of a zfs filesystem can shorten our backup window.  
Currently, our backup solution takes data from ufs or vxfs filesystems from a 
split mirrored disk, mounts it off host, and then writes that directly to tape 
from the off host backup system.  This equates to a "full" backup and requires 
a significant amount of time that the split mirror is attached to that system 
till it can get returned.  I'd like to fit in a zfs filesystem into the mix, 
and hopefully make use of "space saving" snapshot capabilities and find out if 
there is a known way to migrate this into a "incremental" backup, with known 
freeware or os level tools.


I get that if the source data storage was originally zfs instead of ufs|vxfs, 
that I'd be able to take snapshots of the storage pre mirror split, mount that 
storage on the offhost, and then take deltas from the different snapshots to 
turn into files, or get applied directly to another zfs filesystem on the 
offhost that was originally created from information off that detached mirror.  
We could also do this without the mirror split and just do a zfs send and pipe 
the data out to a remote host where it would recreate that snapshot there.  It 
will take a while to get to where I can have zfs running in production as it 
might involve some brainwashing of some DBAs to get it done, so in the 
meantime, what are some thoughts on how to do this without data sitting on a 
zfs source?

Some questions I have are in regards to trying to keep this management of data 
at a "file" level.  I would like to use a zfs filesystem as the repository of 
data, and have that repository most efficiently house the data.  I'd like to 
see that if I sent over binary database files that were sourced on a ufs|vxfs 
filesystem to the zfs filesystem, and then took a snapshot of that data, how 
could I update the data on that zfs filesystem with more current files, and 
then have zfs recognize that the files are mostly the same, and only have some 
differing bits.  Can a snapshotted zfs filesystem, get a file that is named the 
same, overwritten fully on the live zfs filesystem, and use the same amount of 
"block" space that is used in the snapshot?  I don't know if I'm stating that 
all clearly.  I don't know how I can recreate data on a zfs filesystem to the 
point where a zfs snapshot makes use of the same data if it is the same.  I 
know that if I tar -c - | tar -x or find | cpio data onto a zfs filesystem, 
take a snapshot of that zfs fs, and then do the operation again to the same set 
of files, take another snapshot, both snapshots say they consume the amount of 
space that included the total of the amount of files copied.  So they are not 
sharing the same space on blocks of the disk.  Do I understand that correctly?

What I'm really looking for is a way to shrink our backup window, by making use 
of some "tool" that can look at a binary file that is at 2 different points in 
time, say one on a zfs snapshot, and one from a different filesystem, i.e. a 
current split of a mirror housing a zfs|ufs|vxfs filesystem mounted on a host 
that can see both filesystems.  Is there a way to compare the 2 files, and just 
get portions that differ to get written to the copy that is on the zfs 
filesystem, so that after a new snapshot is taken, the zfs snapshot could see 
the amount of changes in that file to only be the delta of bits inside the file 
that changed?  I thought rsync could deal with this, yet I think if timestamp 
changes on your source file, it considers the whole file as changed and would 
copy over the whole thing, yet I'm really not that versed in rsync and could be 
completely wrong.

I guess I'm more after that "tool".  I know there exist agents that can run 
that poll oracle database files and find out what bits changed and write those 
off somewhere.  RMAN can do that, yet that still keeps things down at a DBA 
level, yet I need to keep this backup processing at the SA level.  I'm just 
trying to find out how to migrate our data in a way that is fast, reliable, and 
optimal.

Was checking out these threads: 
http://www.opensolaris.org/jive/thread.jspa?threadID=20276&tstart=0
http://www.opensolaris.org/jive/thread.jspa?threadID=22724&tstart=0

And now just saw an update to http://blogs.sun.com/AVS/.  Maybe all my answers 
lie there... Will dig around there for more, but would welcome feedback and 
ideas for this.

TIA
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS and question on repetative data migrating to it efficiently...

Reply via email to