On 18 mars 2010, at 15:51, Damon Atkins wrote: > A system with 100TB of data its 80% full and the a user ask their local > system admin to restore a directory with large files, as it was 30days ago > with all Windows/CIFS ACLS and NFSv4/ACLS etc. > > If we used zfs send, we need to go back to a zfs send some 30days ago, and > find 80TB of disk space to be able to restore it. > > zfs send/recv is great for copy zfs from one zfs file system to another file > system even across servers.
Bingo ! The zfs send/recv scenario is for backup to another site or server. Backup in this context being a second copy stored independently from the original/master. In one scenario here, we have individual sites that have zvol backed iSCSI volumes based on small, high performance 15K disks in mirror vdevs for the best performance. I only keep about a week of daily snapshots locally. I use ZFS send/recv to a backup system where I have lots of cheap, slow SATA drives in RAIDZ6 where I can afford to accumulate a lot more historical snapshots. The interest is that you can use the same tools in an asymmetric manner, with high performance primary systems and one or a few big slow systems to store your backups. Now for instances where I need to go back and get a file back off an NFS published filesystem, I can just go browse the .zfs/snapshot directory as required - or search for it or whatever I want. It's a live filesystem, not an inert object, dependent on external indices and hardware. I think that this is the fundamental disconnect in these discussions where people's ideas (or requirements) of what constitutes "a backup" are conflicting. There are two major reasons and types of backups : one is to be able to minimize your downtime and get systems running again as quickly as possible. (the server's dead - make it come back!). The other is the ability to go back in time and rescue data that has become lost, corrupted or otherwise unavailable often with very granular requirements. (I need this particular 12K file from August 12, 2009) For my purposes, most of my backup strategies are oriented towards Business Uptime and minimal RTO. Given the data volume I work with using lots of virtual machines, tape is strictly an archival tool. I just can't restore fast enough, and it introduces way to many mechanical dependencies into the process (well I could if I had an unlimited budget). I can restart entire sites from a backup system by cloning a filesystem off a backup snapshot and presenting the volumes to the servers that need it. Granted, I won't have the performance of a primary site, but it will work and people can get work done. This responds to the first requirement of minimal downtime. Going back in time is accomplished via lots of snapshots on the backup storage system. Which I can afford since I'm not using expensive disks here. Then you move up the stack into the contents of the volumes and here's where you use your traditional backup tools to get data off the top of the stack - out of the OS that's handling the contents of the volume that understands it's particularities regarding ACLS and private volume formats like VMFS. zfs send/recv is for cloning data off the bottom of the stack without requiring the least bit of knowledge about what's happening on top. It's just like using any of the asynchronous replication tools that are used in SANs. And they make no bones about the fact that they are strictly a block-level thing and don't even ask them about the contents. At best, they will try to coordinate filesystem snapshots and quiescing operations with the block level snapshots. Other backup tools take your data off the top of the stack in the context where it is used with a fuller understanding of the issues of stuff like ACLs. When dealing with zvols, ZFS should have no responsibility in trying to understand what you do in there other than supplying the blocks. VMFS, NTFS, btrfs, ext4, HFS+, XFS, JFS, ReiserFS and that's just the tip of the iceberg... ZFS has muddied the waters by straddling the SAN and NAS worlds. > But their needs to be a tool: > * To restore an individual file or a zvol (with all ACLs/properties) > * That allows backup vendors (which place backups on tape or disk or CD or > ..) build indexes of what is contain in the backup (e.g. filename, owner, > size modification dates, type (dir/file/etc) ) > *Stream output suitable for devices like tape drives. > *Should be able to tell if the file is corrupted when being restored. > *May support recovery of corrupt data blocks within the stream. > *Preferable gnutar command-line compatible > *That admins can use to backup and transfer a subset of files e.g user home > directory (which is not a file system) to another server or on to CD to be > sent to their new office location, or ???? Highly incomplete and in no particular order : Backup Exec NetBackup Bacula Amanda/Zmanda Retrospect Avamar Arkeia Teradactyl CommVault Acronis Atempo Conceptually, think of a ZFS system as a SAN Box with built-in asynchronous replication (free!) with block-level granularity. Then look at your other backup requirements and attach whatever is required to the top of the stack. Remembering that everyone's requirements can be wildly or subtly different so doing it differently is just adapting to the environment. e.g. - I use ZFS systems at home and work and the tools and scale are wildly different and therefor so are the backup strategies – but that's mostly a budget issue at home... :-) Cheers Erik _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss