Hi Cronfy, On Sunday 24 October 2010 15:15:53 cronfy wrote: > Hello, > > I have a web-server (nginx + apache + mysql, FreeBSD 7.3) with many > sites. Every night it creates a backup of /home on a separate disk. > /home is a RAID1 mirror on Adaptec 3405 (128M write cache) with SAS > drives; /backup is a single SATA drive on the same controller. > > Rsync creates backups using hardlinks, it stores 7 daily and 4 weekly > copies. Total amount of data is ~300G and 11M of files. The server is > under heavy web load every time (appox 100 queries/sec). > > Every time backup starts server slows down significantly, disk > operations become very slow. It may take up to 10 seconds to stat() a > file that is not in filesystem cache. At the same time, rsync on > remote server does not affect disk load much, server works without > slowdown. > > I think that problem can be caused by two reasons: > * either bulk of reads on SATA /backup drive, that fills OS > filesystem cache and many file access operations require real disk > read. > * or bulk of writes on /backup fills controller write cache and geom > disk operations queue grown, causing all disk operations to wait. > > This is only my assumption of course, I may be wrong.
Try "gstat -a" to see which one it is. I guess you'll see bulk reads on /home and bulk reads on /backup mostly. When rsync starts, it will index the source and the destination directory structures using readdir() and stat() calls to see what files have changed (and need to be copied later on). rsync offers the "--bwlimit" option to lower the network bandwidth between an rsync server and a client, but this won't change the stress the stat() calls generate when rsync() indexes the directories. > How can I find a real reason of these slowdowns, to either conclude > that it is not possible to solve this because of hardware/software > limits, or tune my software/hardware system to make this all work at > an acceptable speed? You could try the patch below to rsync's "syscall.c" file, which will pause rsync for short periods of time every second to reduce the IO pressure it creates. Changing "500" to an even lower value, should almost linearly scale the 'busy' percentage "gstat -a" shows to even lower levels. --- syscall.c.org 2010-10-26 22:47:20.000000000 +0200 +++ syscall.c 2010-10-26 22:47:33.000000000 +0200 @@ -215,8 +215,19 @@ #endif } +void tiny_pause(void) +{ + struct timeval tv; + + // only work in the first half of every second. + gettimeofday(&tv, NULL); + if (tv.tv_usec > 500 * 1000) + usleep(1000 * 1000 - tv.tv_usec); +} + int do_stat(const char *fname, STRUCT_STAT *st) { + tiny_pause(); #ifdef USE_STAT64_FUNCS return stat64(fname, st); #else @@ -226,6 +237,7 @@ int do_lstat(const char *fname, STRUCT_STAT *st) { + tiny_pause(); #ifdef SUPPORT_LINKS # ifdef USE_STAT64_FUNCS return lstat64(fname, st); @@ -239,6 +251,7 @@ int do_fstat(int fd, STRUCT_STAT *st) { + tiny_pause(); #ifdef USE_STAT64_FUNCS return fstat64(fd, st); #else Regards, -- Daan Vreeken VEHosting http://VEHosting.nl tel: +31-(0)40-7113050 / +31-(0)6-46210825 KvK nr: 17174380 _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"