We used rsync 2.6.3 on a couple of Solaris 8 machines to update an Oracle database from one machine to another. Here is the procedure I used:

The source database was up and running so this operation was similar to doing a hot backup. I queried the source database for a list of tablespace names, and for each tablespace, I queried the list of datafiles. I put the tablespace in hot backup mode, which means that no updates are written to the datafiles; they will all go the the redo logs. Then I rsync'ed each datafile in that tablespace then took the tablespace out of hot backup mode. Repeat for next tablespace.

Early on in this process, I discovered I had a big performance problem and after some experimentation I learned some important things.

Mainly, it was apparently defaulting to using whole-file mode, which is different from my past experience. Previously I had always supplied directories as the path to rsync, whereas this time I was doing individual files. I'm guessing that caused a different default behavior. After I started using --no-whole-file and --inplace, the situation improved. For files that had few differences, it was quite fast. However, for files that had lots of modified datablocks, it was still taking much longer than an rcp would. An rcp of a 4gb datafile took about seven minutes whereas rsync with about 10% modified data took about half an hour as shown:

-- > Syncing Datafile: /c03/oradata/can/ard04.dbf @ Fri Aug 26 11:46:08 EDT 2005

Number of files: 1
Number of files transferred: 1
Total file size: 4294975488 bytes
Total transferred file size: 4294975488 bytes
Literal data: 403292160 bytes
Matched data: 3891683328 bytes
File list size: 72
Total bytes sent: 4194348
Total bytes received: 405243604

sent 4194348 bytes  received 405243604 bytes  239507.43 bytes/sec
total size is 4294975488  speedup is 10.49

-- > Syncing Datafile: /c03/oradata/can/ard05.dbf @ Fri Aug 26 12:14:37 EDT 2005


Then when we started recovery on the destination database, Oracle complained about block zero being corrupted on six (out of more than 330) of the datafiles (one at a time). All of those were small, so I just used rcp to copy them (in hot backup mode). I started having misgivings then, but continued the process of recovering the database and finally got to applying the next to last redo log and Oracle barfed on block corruption in one of our big datafiles.

All of the small datafiles that had block zero corrupted had a single block transfered via rsync. The process of opening a database and shutting it down will cause an update to block zero, and these datafiles are not really used during day-to-day operation, so it fits that rsync copied one block. In fact, there are a bunch of small datafiles similarly unused that had a single block transfered that Oracle did not complain about.

Here is the command line I used:

rsync -ptgoHS --stats --rsh=/usr/bin/rsh -B 8192 --no-whole-file --inplace \
rmthost:${df} ${df}

I probably shouldn't have used -H, and I saw a bug report about it, but can't believe it is related to my corruption problem. Is it possible -S is involved somehow?

The data corruption of course makes rsync useless to me for copying databases, and I'm wondering now if other things I use it for are susceptible to the same problem.

However, even if the corruption problem is fixed, the performance of rsync on large datafiles with more than a few percent of modified blocks may make it not worth using.

Any help is appreciated.

Linus
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to