Lester, You articulated your situation clear enough for me. Thanks.
I'll address your issue about when rsync is running locally for /vol/N to /vol/N_mirror syncing, it exhausts all of the RAM and swap. If you haven't read jw schultz's "How Rsync Works" page, here the link: http://www.pegasys.ws/how-rsync-works.html The sender, receiver, and generator each have a full copy of the file list (each file's entry uses 100 bytes on average). Additonally, the --hard-links option creates yet *another* full copy of the file list in the receiver, so that's even more memory consumed. So you are in a world o' hurt rsyncing an entire /vol/N internally with --hard-links, since there will be FOUR copies of the file list. I'd suggest breaking the /vol/N rsync up into separate rsyncs for each of the maxdepth 1 hierarchies. If I understand your situation correctly, all hard link groups are self contained within each of those hierarchies so you will be OK. I've modified hlink.c to use a list of file struct pointers instead of copies of the actual file structs themselves, so that will save memory. I'll submit that patch for review in a day or two after I've tested it. -- John Van Essen Univ of MN Alumnus <[EMAIL PROTECTED]> On Sat, 3 Jan 2004, Lester Hightower <[EMAIL PROTECTED]> wrote: > Hello, > > I read with interest the mailing list thread found here: > > http://marc.10east.com/?t=107160967400007&r=1&w=2 > > We have a "situation" with rsync and --hard-links that was the reason for > my search in MARC's rsync list archive that turned up the thread shown > above. After reading through that thread, and other information on this > topic, I believe that sharing our situation with you will in itself prove > to be a good contribution to rsync (which is an excellent tool, BTW). > > So, here goes: > > We have a process on a backup server (I called it "s" below), that each > night rsyncs a full copy of /, /var, and /usr from a great number of > systems. As a rule we put /, /var, and /usr on separate partitions, but > that detail is not important. What is important is to understand exactly > how we do these nightly, full system backups. First, let me start by > showing you what a small set of the system_backups hierarchy looks like: > > [EMAIL PROTECTED]:/vol/6/system_backups# find . -type d -maxdepth 1 > . > ./client1 > ./docs1.colo1 > ./docs2.colo1 > ./ipfw-internal.colo1 > ./ipfw1 > ./ipfw2 > ./docsdev1 > > [EMAIL PROTECTED]:/vol/6/system_backups# find . -type d -maxdepth 2|head -25|egrep > -v '^\./[^/]+$'|sort > . > ./client1/20031223 > ./client1/20031224 > ./client1/20031225 > ./client1/20031226 > ./client1/20031227 > ./client1/20031229 > ./client1/20040102 > ./client1/current > ./docs1.colo1/20031219 > ./docs1.colo1/20031223 > ./docs1.colo1/20031224 > ./docs1.colo1/20031225 > ./docs1.colo1/20031226 > ./docs1.colo1/20031227 > ./docs1.colo1/20031229 > ./docs1.colo1/20040102 > ./docs1.colo1/current > ./docs1.colo1/image-20031218 > ./docs2.colo1/20031218 > ./docs2.colo1/20031219 > ./docs2.colo1/current > > OK, that gives you an idea of how the hierarchy looks. Here is the critical > part, though. The logic that creates these each night looks like this: > > TODAY=<YYYYMMDD for today> > for HOST in (<hosts>); do > cp -al $HOST/current $HOST/$TODAY > ...now rsync remote $HOST into my local $HOST/current... > done > > For those not familiar with the -l option to cp: > > [EMAIL PROTECTED]:/vol/6/system_backups# man cp|grep -B1 -A1 'hard links instead' > -l, --link > Make hard links instead of copies of non-directo- > ries. > > What we end up with is a tree that is _very_ fast to rsync each night, > with revision history going back indefinitely, at the disk usage cost of > only files that change (rare) and the directories (about 8MB per machine). > Note, however, that the _vast_ majority of file entries on these file > systems (system_backups) are hard links. Many inodes will have 20, 30, or > more filename entries pointing at them (depending strictly on how much > history we choose to keep). > > Keeping all that in mind, now understand that server "s" has /vol/(0..14) > installed in its disk subsystem, and (the important part) each of those > volumes has a slow mirror -- one rsync per day. We do not keep those > mirrors mounted, but you could think of /vol/0 having a /vol/0_mirror > partner that is rsynced once every twenty-four hours. > > All of this works absolutely perfectly, with one exception, the daily > rsync of /vol/N to /vol/N_mirror for volumes that hold system_backups, and > the reason appears to be the --hard-links flag. Rsync, which is running > completely locally for /vol/N to /vol/N_mirror work, exhausts all of the > RAM and swap allocated to it in this machine (3GB), sends the machine into > a maddening swap spiral, etc. The issue only exists for /vol/N vols where > we have "system_backups" stored. > > I wanted to share this circumstance with you because my reading of the > discussion on this topic, though encouraging, left me with the impression > that some might not be thinking about situations like this one, where it > is perfectly normal and desired to have many hard links to one inode, and > hundreds of thousands of hard links in one file system. > > To give you an idea of the type of information one can glean from such a > backup process, here are a couple of examples. Keep in mind that files > with link-count of 1 changed on the date indicated by the directory: > > [EMAIL PROTECTED]:/vol/6/system_backups/client1# find 20040102 -links 1 -type f|head > -2 > 20040102/root/.bash_history > 20040102/tmp/.803.e4a1 > > [EMAIL PROTECTED]:/vol/6/system_backups/client1# diff 20040102/root/.bash_history > current/root/.bash_history > 1d0 > < lynx http://localhost:1081 --source | grep Rebuilding | head -1 | cut 10- > 500a500 >> ssh [EMAIL PROTECTED] > > [EMAIL PROTECTED]:/vol/6/system_backups/client1# find 20040102 -links 1 -type f|cut > -d/ -f1,2,3,4|sort |uniq -c > 1 20040102/SYMLINKS > 1 20040102/root/.bash_history > 1 20040102/tmp/.803.e4a1 > 1 20040102/usr/local/BMS > 54 20040102/usr/local/WWW > 17 20040102/usr/local/etc > 1 20040102/usr/sbin/symlinks > 42 20040102/vol/1/bmshome > 1 20040102/vol/2/webalizer_working > 12 20040102/vol/3/home > > You'll notice that the hard link counts in this file system are not very > high yet (only 8), yet it is _very_ intensive to have rsync try to sync > /vol/6system_backups/client1 to /vol/6_mirror/system_backups/client1 with > the --hard-links flag set: > > [EMAIL PROTECTED]:/vol/6/system_backups/client1# find 20040102 ! -links 1 -type f > -printf '%n\t%i\t%s\t%d\t%h/%f\n'|head -50|tail -5 > 8 11323 10108 2 20040102/bin/mknod > 8 11324 25108 2 20040102/bin/more > 8 11325 60912 2 20040102/bin/mount > 8 11326 10556 2 20040102/bin/mt-GNU > 8 11327 33848 2 20040102/bin/mv > > > If there is anything that I did not articulate clearly, if you have any > followup questions, if you would like us to test some code for you guys, > or if there is anything else that you feel that I can do to help, please > do not hesitate to ask. > > Sincerely, > > -- > Lester Hightower > 10East Corp. > > > p.s. 10East created and now supports the MARC system (marc.10east.com) in > various ways, including hosting it, though it is primarily administered by > Mr. Hank Leininger, a good friend and former employee. I didn't see any > mention of MARC in the rsync web-site. Please feel free to use it. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html