On Wed, May 20, 2009 at 2:26 AM, Julian Pace Ross <li...@prisma.com.mt> wrote: > Thanks Ryan! > In fact I found it's a combination of factors you mentioned... i.e. a > compressed SQL .bak file, so contrary to what I thought, the fuzzy file was > indeed being found but no matches were being found in the file... thanks > again for the info.
If you have the disk space at both ends, I would suggest doing what I do for SQL backup synchronization. 1) Write *uncompressed* .bak files for your databases (with timestamps in the file name, such as those produced by the database maintenance plan engine). This enables the use of --fuzzy, as you have discovered. 2) use Rsync to transfer the uncompressed files, but with the -z option enbaled. This compresses the data over the wire, but decompresses it at the receiving end. 3) Adjust the rsync block size to something smaller if necessary to find more matches. I basically went down to 32KB rsync blocks for one 15 GB database file (rsync would by default use something like 129KB on a file this big). This eats up a lot more CPU, but if irsync can still output data faster than your network connection can handle, it is the most time-efficient way to go. Use multiples of 8KB, as that is the internal page size inherent in MS SQL Server databases. Trial and error is your friend here. Run rsyc with low priority (START /LOW rsync.exe) so the CPU usage doesn't impact SQL Server. 4) Minimize any jobs you have to automatically rebuild indexes. Use UPDATE STATISTICS instead on a daily basis, and rebuild only when index fragmentation gets heavy. There are lots of scripts out there on the net which will automate that for you. 5) Minimize the rebuilds of denormalized "reporting" tables or other non-essential data. Move these off into other databases that you don't replicate if possible. 6) Watch out for non-sequential clustered indexes. We use GUIDs for primary keys on many tables, and this causes updates and inserts to be spread randomly throughout the table as it is physically stored. Even channging just 5% of the data can result in a change to every database page in such a scenario). Hot tables which use emails or other VARCHAR fields as clustered index keys also result in similar behavior. Most of these suggestions would apply for rsyncing any sort of database backup file... Exchange, PostgreSQL, Oracle, or even (horror!) MySQL. -- RPM -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html