Ed Spencer wrote:
I don't know of any reason why we can't turn 1 backup job per filesystem
into say, up to say , 26 based on the cyrus file and directory
structure.
No reason whatsoever. Sometimes the more the better as per the rest of
this thread. The key
here is to test and tweak till you get the optimal arrangement of backup
window time and performance.
Performance tuning is a little bit of a Journey, that sooner or later
has a final destination. ;)
The cyrus file and directory structure is designed with users located
under the directories A,B,C,D,etc to deal with the millions of little
files issue at the filesystem layer.
The sun messaging server actually hashes the user names into a structure
which looks quite similar
to a squid cache store. This has a top level of 128 directories, which
each in turn contain 128 directories,
which then contain a folder for each user that has been mapped into that
structure by the hash algorithm
on the user name. I use a wildcard mapping to split this into 16
streams to cover the 0-9, a-f of the hexadecimal
directory structure names. eg. /mailstore1/users/0*
Our backups will have to be changed to use this design feature.
There will be a little work on the front end to create the jobs but
once done the full backups should finish in a couple of hours.
The nice thing about this work is it really is only a one off
configuration in the backup software
and then it is done. Certainly works a lot better than something like
ALL_LOCAL_DRIVES
in Netbackup which effectively forks one backup thread per file system.
As an aside, we are currently upgrading our backup server to a sun4v
machine.
This architecture is well suited to run more jobs in parallel.
I use a T5220 with staging to a J4500 with 48 x 1 TB disks in a zpool
with 6 file systems. This then gets streamed
to 6 LTO4 tape drives in a SL500 .Needless to say this supports a high
degree of parallelism and generally
finds the source server to be the bottleneck. I also take advantage of
the 10 GigE capability
built straight into the Ultrasparc T2. Only major bottleneck in this
system is the SAS interconnect to the J4500.
Thanx for all your help and advice.
Ed
On Tue, 2009-08-11 at 22:47, Mike Gerdts wrote:
On Tue, Aug 11, 2009 at 9:39 AM, Ed Spencer<ed_spen...@umanitoba.ca> wrote:
We backup 2 filesystems on tuesday, 2 filesystems on thursday, and 2 on
saturday. We backup to disk and then clone to tape. Our backup people
can only handle doing 2 filesystems per night.
Creating more filesystems to increase the parallelism of our backup is
one solution but its a major redesign of the of the mail system.
What is magical about a 1:1 mapping of backup job to file system?
According to the Networker manual[1], a save set in Networker can be
configured to back up certain directories. According to some random
documentation about Cyrus[2], mail boxes fall under a pretty
predictable hierarchy.
1. http://oregonstate.edu/net/services/backups/clients/7_4/admin7_4.pdf
2. http://nakedape.cc/info/Cyrus-IMAP-HOWTO/components.html
Assuming that the way that your mailboxes get hashed fall into a
structure like $fs/b/bigbird and $fs/g/grover (and not just
$fs/bigbird and $fs/grover), you should be able to set a save set per
top level directory or per group of a few directories. That is,
create a save set for $fs/a, $fs/b, etc. or $fs/a - $fs/d, $fs/e -
$fs/h, etc. If you are able to create many smaller save sets and turn
the parallelism up you should be able to drive more throughput.
I wouldn't get too worried about ensuring that they all start at the
same time[3], but it would probably make sense to prioritize the
larger ones so that they start early and the smaller ones can fill in
the parallelism gaps as the longer-running ones finish.
3. That is, there is sometimes benefit in having many more jobs to run
than you have concurrent streams. This avoids having one save set
that finishes long after all the others because of poorly balanced
save sets.
Couldn't agree more Mike.
--
Mike Gerdts
http://mgerdts.blogspot.com/
--
_______________________________________________________________________
Scott Lawson
Systems Architect
Manukau Institute of Technology
Information Communication Technology Services Private Bag 94006 Manukau
City Auckland New Zealand
Phone : +64 09 968 7611
Fax : +64 09 968 7641
Mobile : +64 27 568 7611
mailto:sc...@manukau.ac.nz
http://www.manukau.ac.nz
________________________________________________________________________
perl -e 'print
$i=pack(c5,(41*2),sqrt(7056),(unpack(c,H)-2),oct(115),10);'
________________________________________________________________________
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss