Ed Spencer wrote:
I don't know of any reason why we can't turn 1 backup job per filesystem
into say, up to say , 26 based on the cyrus file and directory
structure.
No reason whatsoever. Sometimes the more the better as per the rest of this thread. The key here is to test and tweak till you get the optimal arrangement of backup window time and performance.

Performance tuning is a little bit of a Journey, that sooner or later has a final destination. ;)
The cyrus file and directory structure is designed with users located
under the directories A,B,C,D,etc to deal with the millions of little
files issue at the  filesystem layer.
The sun messaging server actually hashes the user names into a structure which looks quite similar to a squid cache store. This has a top level of 128 directories, which each in turn contain 128 directories, which then contain a folder for each user that has been mapped into that structure by the hash algorithm on the user name. I use a wildcard mapping to split this into 16 streams to cover the 0-9, a-f of the hexadecimal
directory structure names. eg. /mailstore1/users/0*
Our backups will have to be changed to use this design feature.
There will be a little work on the front end  to create the jobs but
once done the full backups should finish in a couple of hours.
The nice thing about this work is it really is only a one off configuration in the backup software and then it is done. Certainly works a lot better than something like ALL_LOCAL_DRIVES
in Netbackup which effectively forks one backup thread per file system.
As an aside, we are currently upgrading our backup server to a sun4v
machine.
This architecture is well suited to run more jobs in parallel.
I use a T5220 with staging to a J4500 with 48 x 1 TB disks in a zpool with 6 file systems. This then gets streamed to 6 LTO4 tape drives in a SL500 .Needless to say this supports a high degree of parallelism and generally finds the source server to be the bottleneck. I also take advantage of the 10 GigE capability built straight into the Ultrasparc T2. Only major bottleneck in this system is the SAS interconnect to the J4500.
Thanx for all your help and advice.

Ed

On Tue, 2009-08-11 at 22:47, Mike Gerdts wrote:
On Tue, Aug 11, 2009 at 9:39 AM, Ed Spencer<ed_spen...@umanitoba.ca> wrote:
We backup 2 filesystems on tuesday, 2 filesystems on thursday, and 2 on
saturday. We backup to disk and then clone to tape. Our backup people
can only handle doing 2 filesystems per night.

Creating more filesystems to increase the parallelism of our backup is
one solution but its a major redesign of the of the mail system.
What is magical about a 1:1 mapping of backup job to file system?
According to the Networker manual[1], a save set in Networker can be
configured to back up certain directories.  According to some random
documentation about Cyrus[2], mail boxes fall under a pretty
predictable hierarchy.

1. http://oregonstate.edu/net/services/backups/clients/7_4/admin7_4.pdf
2. http://nakedape.cc/info/Cyrus-IMAP-HOWTO/components.html

Assuming that the way that your mailboxes get hashed fall into a
structure like $fs/b/bigbird and $fs/g/grover (and not just
$fs/bigbird and $fs/grover), you should be able to set a save set per
top level directory or per group of a few directories.  That is,
create a save set for $fs/a, $fs/b, etc. or $fs/a - $fs/d, $fs/e -
$fs/h, etc.  If you are able to create many smaller save sets and turn
the parallelism up you should be able to drive more throughput.

I wouldn't get too worried about ensuring that they all start at the
same time[3], but it would probably make sense to prioritize the
larger ones so that they start early and the smaller ones can fill in
the parallelism gaps as the longer-running ones finish.

3. That is, there is sometimes benefit in having many more jobs to run
than you have concurrent streams.  This avoids having one save set
that finishes long after all the others because of poorly balanced
save sets.
Couldn't agree more Mike.
--
Mike Gerdts
http://mgerdts.blogspot.com/

--
_______________________________________________________________________


Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand

Phone  : +64 09 968 7611
Fax    : +64 09 968 7641
Mobile : +64 27 568 7611

mailto:sc...@manukau.ac.nz

http://www.manukau.ac.nz

________________________________________________________________________


perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'

________________________________________________________________________
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to