Thank you for all your replies, I'm collecting my responses in one message below:
On Tue, Aug 18, 2009 at 7:43 PM, Nicolas Williams<nicolas.willi...@sun.com> wrote: > On Tue, Aug 18, 2009 at 04:22:19PM -0400, Paul Kraus wrote: >> We have a system with some large datasets (3.3 TB and about 35 >> million files) and conventional backups take a long time (using >> Netbackup 6.5 a FULL takes between two and three days, differential >> incrementals, even with very few files changing, take between 15 and >> 20 hours). We already use snapshots for day to day restores, but we >> need the 'real' backups for DR. > > zfs send will be very fast for "differential incrementals ... with very > few files changing" since zfs send is a block-level diff based on the > differences between the selected snapshots. Where a traditional backup > tool would have to traverse the entire filesystem (modulo pruning based > on ctime/mtime), zfs send simply traverses a list of changed blocks > that's kept up by ZFS as you make changes in the first place. Our testing indicates that for incremental zfs send the speed is very good, and seems to be bandwidth limited and not limited by file count. For example, while testing incremental sends I got the following results: ~450,000 files sent, ~8.3 GB sent @ 690 files/sec. and 13 MB/sec. ~900,000 files sent, ~13 GB sent @ 890 files/sec. and 13 MB/sec. ~450,000 files sent, ~ 4.6 GB sent @ 1,800 files/sec. and 19 MB/sec. Full zfs sends produced: ~2.5 million files, ~87 GB @ 500 files/sec. and 18 MB/sec. ~3.4 million files, ~ 100 GB @ 600 files/sec. and 19 MB/sec. > For a *full* backup zfs send and traditional backup tools will have > similar results as both will be I/O bound and both will have more or > less the same number of I/Os to do. The zfs send FULLS are in close agreement with what we are seeing with a FULL NBU backup. > Caveat: zfs send formats are not guraranteed to be backwards > compatible, therefore zfs send is not suitable for long-term backups. Yup, we only need them for 5 weeks, and when we upgrade the server (and ZFS version) we would need to do a new set of fulls. On Tue, Aug 18, 2009 at 8:54 PM, Mattias Pantzare <pant...@ludd.ltu.se> wrote: > Conventional backups can be faster that that! I have not used > netbackup but you should be able to configure netbackup to run several > backup streams in parallel. You may have to point netbackup to subdirs > instead of the file system root. We have over 180 filesystems on the production server right now, we are really trying to avoid any manual customization of the backup policy. In a previous incarnation this data lived on a Mac OS X server in one FS (only about 4 TB total at that point), full backups took so long that we manually configured three NBU policies with many individual directories ... it was a nighmare as new data (and directories) were added. On Tue, Aug 18, 2009 at 10:33 PM, Mike Gerdts <mger...@gmail.com> wrote: > This was discussed in another thread as well. > > http://opensolaris.org/jive/thread.jspa?threadID=109751&tstart=0 Thanks for that pointer. I had missed that thread in my search, I just hadn't hit the right keywords. This thread got me thinking about our data layout. Currently the data broken up by both department and project. Each department gets a zpool and each project within the department a dataset/zfs. Departments range in size from one mirrored pair of LUNs (512 GB) to 11 mirrored pairs of LUNs (5.5 TB). Projects range from a few KB to 3.3 TB (and 33 million files). The data is all relatively small, images of documents, but there are many, many of them. Is there any throughput penalty for the dataset being part of a bigger zpool ? In other words, am I more likely to get better FULL throughput if I move the data to a dedicated zpool instead of a child dataset ? We *can* change our model to assign each project a separate zpool, but that would be wasteful of space. Perhaps move a given project to it's own zpool when it grows to a certain size (>1 TB maybe). But, if there would not be any performance advantage, it's not worth the effort. I had assumed that a full zfs send would just stream the underlying zfs structure and not really deal with individual files, but if the dataset is part of a shared zpool then I guess it has to look at the files' metadata to determine if a given file is part of that dataset. P.S. We are planning to move the backend stoage to JBOD (probably J4400), but that is not where we are today, and we can't count on that happening soon. -- {--------1---------2---------3---------4---------5---------6---------7---------} Paul Kraus -> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ ) -> Sound Designer, "The Pajama Game" @ Schenectady Light Opera Company ( http://www.sloctheater.org/ ) -> Technical Advisor, Lunacon 2010 (http://www.lunacon.org/) -> Technical Advisor, RPI Players _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss