Hello, I read with interest the mailing list thread found here:
http://marc.10east.com/?t=107160967400007&r=1&w=2 We have a "situation" with rsync and --hard-links that was the reason for my search in MARC's rsync list archive that turned up the thread shown above. After reading through that thread, and other information on this topic, I believe that sharing our situation with you will in itself prove to be a good contribution to rsync (which is an excellent tool, BTW). So, here goes: We have a process on a backup server (I called it "s" below), that each night rsyncs a full copy of /, /var, and /usr from a great number of systems. As a rule we put /, /var, and /usr on separate partitions, but that detail is not important. What is important is to understand exactly how we do these nightly, full system backups. First, let me start by showing you what a small set of the system_backups hierarchy looks like: [EMAIL PROTECTED]:/vol/6/system_backups# find . -type d -maxdepth 1 . ./client1 ./docs1.colo1 ./docs2.colo1 ./ipfw-internal.colo1 ./ipfw1 ./ipfw2 ./docsdev1 [EMAIL PROTECTED]:/vol/6/system_backups# find . -type d -maxdepth 2|head -25|egrep -v '^\./[^/]+$'|sort . ./client1/20031223 ./client1/20031224 ./client1/20031225 ./client1/20031226 ./client1/20031227 ./client1/20031229 ./client1/20040102 ./client1/current ./docs1.colo1/20031219 ./docs1.colo1/20031223 ./docs1.colo1/20031224 ./docs1.colo1/20031225 ./docs1.colo1/20031226 ./docs1.colo1/20031227 ./docs1.colo1/20031229 ./docs1.colo1/20040102 ./docs1.colo1/current ./docs1.colo1/image-20031218 ./docs2.colo1/20031218 ./docs2.colo1/20031219 ./docs2.colo1/current OK, that gives you an idea of how the hierarchy looks. Here is the critical part, though. The logic that creates these each night looks like this: TODAY=<YYYYMMDD for today> for HOST in (<hosts>); do cp -al $HOST/current $HOST/$TODAY ...now rsync remote $HOST into my local $HOST/current... done For those not familiar with the -l option to cp: [EMAIL PROTECTED]:/vol/6/system_backups# man cp|grep -B1 -A1 'hard links instead' -l, --link Make hard links instead of copies of non-directo- ries. What we end up with is a tree that is _very_ fast to rsync each night, with revision history going back indefinitely, at the disk usage cost of only files that change (rare) and the directories (about 8MB per machine). Note, however, that the _vast_ majority of file entries on these file systems (system_backups) are hard links. Many inodes will have 20, 30, or more filename entries pointing at them (depending strictly on how much history we choose to keep). Keeping all that in mind, now understand that server "s" has /vol/(0..14) installed in its disk subsystem, and (the important part) each of those volumes has a slow mirror -- one rsync per day. We do not keep those mirrors mounted, but you could think of /vol/0 having a /vol/0_mirror partner that is rsynced once every twenty-four hours. All of this works absolutely perfectly, with one exception, the daily rsync of /vol/N to /vol/N_mirror for volumes that hold system_backups, and the reason appears to be the --hard-links flag. Rsync, which is running completely locally for /vol/N to /vol/N_mirror work, exhausts all of the RAM and swap allocated to it in this machine (3GB), sends the machine into a maddening swap spiral, etc. The issue only exists for /vol/N vols where we have "system_backups" stored. I wanted to share this circumstance with you because my reading of the discussion on this topic, though encouraging, left me with the impression that some might not be thinking about situations like this one, where it is perfectly normal and desired to have many hard links to one inode, and hundreds of thousands of hard links in one file system. To give you an idea of the type of information one can glean from such a backup process, here are a couple of examples. Keep in mind that files with link-count of 1 changed on the date indicated by the directory: [EMAIL PROTECTED]:/vol/6/system_backups/client1# find 20040102 -links 1 -type f|head -2 20040102/root/.bash_history 20040102/tmp/.803.e4a1 [EMAIL PROTECTED]:/vol/6/system_backups/client1# diff 20040102/root/.bash_history current/root/.bash_history 1d0 < lynx http://localhost:1081 --source | grep Rebuilding | head -1 | cut 10- 500a500 > ssh [EMAIL PROTECTED] [EMAIL PROTECTED]:/vol/6/system_backups/client1# find 20040102 -links 1 -type f|cut -d/ -f1,2,3,4|sort |uniq -c 1 20040102/SYMLINKS 1 20040102/root/.bash_history 1 20040102/tmp/.803.e4a1 1 20040102/usr/local/BMS 54 20040102/usr/local/WWW 17 20040102/usr/local/etc 1 20040102/usr/sbin/symlinks 42 20040102/vol/1/bmshome 1 20040102/vol/2/webalizer_working 12 20040102/vol/3/home You'll notice that the hard link counts in this file system are not very high yet (only 8), yet it is _very_ intensive to have rsync try to sync /vol/6system_backups/client1 to /vol/6_mirror/system_backups/client1 with the --hard-links flag set: [EMAIL PROTECTED]:/vol/6/system_backups/client1# find 20040102 ! -links 1 -type f -printf '%n\t%i\t%s\t%d\t%h/%f\n'|head -50|tail -5 8 11323 10108 2 20040102/bin/mknod 8 11324 25108 2 20040102/bin/more 8 11325 60912 2 20040102/bin/mount 8 11326 10556 2 20040102/bin/mt-GNU 8 11327 33848 2 20040102/bin/mv If there is anything that I did not articulate clearly, if you have any followup questions, if you would like us to test some code for you guys, or if there is anything else that you feel that I can do to help, please do not hesitate to ask. Sincerely, -- Lester Hightower 10East Corp. p.s. 10East created and now supports the MARC system (marc.10east.com) in various ways, including hosting it, though it is primarily administered by Mr. Hank Leininger, a good friend and former employee. I didn't see any mention of MARC in the rsync web-site. Please feel free to use it. -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html