On 16 May 2001 20:49:10 +0100, bernard.mcauley wrote: > Hi, > > I'm currently using rsync to mirror a large directory at a remote site. > The directory tree is over 40 GB in size and undergoes constant updates. > However, I would like to have more control over the mirror as changes to > the directory tree can cause problems. What I would like to do is to > transfer 'diff' type output and store this before I update the directory > tree. This would allow me to see what changes have been made and > possible back track if the changes cause problems. > > I realise that something like CVS would probably be better for this, but > the remote site is not under my control so I can't implement any new > software. Does anyone know of a way I can get rsync to behave in this > manner? I have something that may help, it isn't exactly what you are looking for but is very useful if you are mirroring large file systems where users are prone to do things such as add an extra folder into pathnames etc. I've been mirroring just such a system for 9 months down a 128K link. I use the attached tcl script to help spot things such as renaming of folders, gzip'ing of files etc. I've called the script rthync, the idea is it "thinks" about what rsync plans to do and looks to see if it can help it. You run rsync with -n and --delete and pipe the output into rthync. The script only works at the destination end + it has to be told where the destination directory is. The main idea is that the best place to look for missing files is in the deletion list (eg if a file has been renamed, gzip'ed gunzip'ed etc). Easiest way to run rthync is something like this: cd "destination" rsync -n --delete "source" "destination" | rthync -n -d "destination" With the -n flag rthync will tell you what (if anything) it would do. In my mirroring jobs I do things like this: cd "destination" rsync -n --delete "source" "destination" | rthync -d "destination" rsync --delete "source" "destination" Really, I should be hacking rsync itself where i could get at stuff like file sizes and access times to help the scoring process when I have several choices for a missing file, but this has been handy to prove the idea. In the time that i've been running this it has saved a fair chunk of bandwidth (as well as some spectacularly bad choices for re-naming files.. the good news is it doesn't cost much for such a mistake). You'll see some nonesense in the script prepending ./ to lots of filenames. This is not un-connected with the bright NT hacker who thought ~$Normal.dot was a good name for a file + me being unable to make tcl resist doing tilda expansion on the ~. Feel free to do whatever you see fit with the script, its under the FTGH (free-to-good-home) licence... but very definitely use at your own risk. John
#!/usr/bin/tclsh # # rthync (pronounced r-think): make rsync think # proc rthyncer {dry_run} { set deletions() {} set updates {} set lookup() {} set actions() {} while {![eof stdin]} { set line [gets stdin] # skip header lines if [regexp -- "^building file list" $line] continue # check for end of data if [regexp -- "wrote \[0-9\]+ bytes" $line] break # check for deletions if [regexp -- "^deleting " $line] { regsub "^deleting " $line {} target if {![file isdirectory ./$target]} { set deletions(./$target) 1 lappend lookup([file tail ./$target]) ./$target } continue } # anything else is file to be updated, we're only concerned # with ones that don't exist. if {! [file exists ./$line] } { lappend updates ./$line } } # Now loop round updates to see if there are any ways we can # create them from stuff in deletions set missing "" foreach update $updates { # check for gzip'ed file if {[file extension $update] == ".gz"} { set rootname [file rootname $update] if [info exists deletions($rootname)] { action_update $rootname $update 1000000 "exec gzip $rootname" actions continue } } # check for file to gunzip if [info exists deletions(${update}.gz)] { action_update ${update}.gz $update 1000000 "exec gunzip ${update}.gz" actions continue } # see if there is a deletion in another folder that we can use set tail [file tail $update] if [info exists lookup($tail)] { scorem $lookup($tail) $update actions } } # now pass through actions and just extract the best one for # each target. set targets() "" dedupe_actions actions targets # now loop round actions and print them out + optionally carry them out foreach action [array names targets] { if {$action != ""} { puts $targets($action) if {! $dry_run} { eval [lindex $targets($action) 2] } } } } proc dedupe_actions {acts targs} { upvar $acts actions upvar $targs targets foreach action [array names actions] { if {$action != ""} { set targ [lindex $actions($action) 1] if [info exists targets($targ)] { if {[lindex $actions($action) 0] > [lindex $targets($targ) 0]} { set targets($targ) $actions($action) } } else { set targets($targ) $actions($action) } } } return } proc action_update {source target score cmd actions} { upvar $actions action if [info exists action($source)] { if {$score > [lindex $action($source) 0]} { set action($source) [list $score $target $cmd] } } else { set action($source) [list $score $target $cmd] } return } proc scorem {sources target actions} { upvar $actions action set tail [file tail $target] set starget [file split [file dirname $target]] set slen [llength $starget] foreach source $sources { set slist [file split [file dirname $source]] set len2 [llength $slist] if {$slen < $len2} { set len2 $slen set offa [expr $len2 - $slen] set offb 0 } else { set offa 0 set offb [expr $slen - $len2] } set score 0 for {set j 0} {$j < $len2} {incr j} { if {[lindex $starget $j] == "."} continue if {[lindex $starget $j] == [lindex $slist $j]} { # give a bit more weight to things earlier in the path incr score [expr 1000 - $j] } if {[lindex $starget [expr $j + $offb]] == \ [lindex $slist [expr $j + $offa]]} { # give a bit more weight to things earlier in the path incr score [expr 1000 - $j] } } action_update $source $target $score [list mv $source $target] action } return } proc mv {source target} { file mkdir [file dirname $target] file rename $source $target } # deal with the command line set dry_run 0 for {set i 0} {$i < $argc} {incr i} { set arg [lindex $argv $i] case $arg { "-d" { incr i cd [lindex $argv $i] puts "Working directory [pwd]" } "-n" { set dry_run 1 } default { # shouldn't get here puts stderr "Usage: rthync \[-d target\] \[-n\]" exit 1 } } } if {! $dry_run } { puts "rthync: doing it for real" } else { puts "rthync: dry run" } rthyncer $dry_run