Paul Haas wrote:
Thanks for understanding the problem. I have two drives in my server and I use rsync to keep them in sync. But when it does the IO climbs so high it practically stops MySQL from running. And - for the record - updatedb does the same thing.
It's a real issue, and it isn't specific to rsync. You've got a webserver
that runs well on your hardware, assuming reasonable disk caching and/or
disk I/O rates. rsync comes along and reads lots of files, thus clearing
your cache, and it walks the directory tree, doing reads spread all over
the disk. The reads spread over the disk means the disk heads are
spending lots of time seeking back and forth in the famous elevator
algorithm. Your processes end up spending a lot of time waiting for that
elevator.
I don't think it requires changes to rsync, certainly nothing significant. I think you want a separate process to implement your policy. Something only a little more complex than this untested perl script:
===============cut here================
#!/usr/bin/perl -w $tooHigh = 4; # Max acceptable load average $checkTime = 10; # Seconds between checking load average $restTime = 60; # Seconds to pause disk hog process when load average high @rsyncPids = @ARGV; while (1) { # fix this, script should end when the pids exit. if ( LoadAvg() > $tooHigh ) { PausePids(@rsyncPids); sleep(60); ResumePids(@rsyncPids); } sleep(10); } sub LoadAvg { $upString = `uptime`; ($loadAvg) = ($upString =~ m/load average: (\d+\.\d+)/); return $loadAvg; }
sub PausePids { $SigStop = 19; kill $SigStop, @_; } sub ResumePids { $SigCont = 18; kill $SigCont, @_; }
===============cut here================
This is very interesting. I'm not a programmer but it's the start of what I'm looking for and may be something that I could apply to updatedb as well. Can you make a few mods to it?
First - I'd like to pass a command line to it with the name of the program (regex) and have it find the pids. Then - have switched for the load level, check time, and pause time (-l 4 -c 10 -p 60) and -v for verbose, and -V for version, and -h for help, maybe -q to make existing throttles quit. So if it were called "throttle.pl" then the command line might look like this:
throttle -l 4 -c 10 -p 60 "rsync|updatedb"
Also - is there a variable in /proc you can read for load averages? Looks to me like this is almost a product!
If the load average climbs too high, it pauses rsync, or whatever pids you asked to pause.
I don't think rsync needs to be changed at all, provided you avoid anything involving timeouts.
There are certainly situations where rsync would be the important task, and it would be the other disk hog process that should pause.
It's debatable whether I count as a real developer.
Looks like you're a reeal developer to me!!!
-- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html