Data corruption

Linus Hicks Mon, 29 Aug 2005 11:29:33 -0700

We used rsync 2.6.3 on a couple of Solaris 8 machines to update an Oracledatabase from one machine to another. Here is the procedure I used:

The source database was up and running so this operation was similar to doing ahot backup. I queried the source database for a list of tablespace names, andfor each tablespace, I queried the list of datafiles. I put the tablespace inhot backup mode, which means that no updates are written to the datafiles; theywill all go the the redo logs. Then I rsync'ed each datafile in that tablespacethen took the tablespace out of hot backup mode. Repeat for next tablespace.

Early on in this process, I discovered I had a big performance problem and aftersome experimentation I learned some important things.

Mainly, it was apparently defaulting to using whole-file mode, which isdifferent from my past experience. Previously I had always supplied directoriesas the path to rsync, whereas this time I was doing individual files. I'mguessing that caused a different default behavior. After I started using--no-whole-file and --inplace, the situation improved. For files that had fewdifferences, it was quite fast. However, for files that had lots of modifieddatablocks, it was still taking much longer than an rcp would. An rcp of a 4gbdatafile took about seven minutes whereas rsync with about 10% modified datatook about half an hour as shown:


-- > Syncing Datafile: /c03/oradata/can/ard04.dbf @ Fri Aug 26 11:46:08 EDT 2005

Number of files: 1
Number of files transferred: 1
Total file size: 4294975488 bytes
Total transferred file size: 4294975488 bytes
Literal data: 403292160 bytes
Matched data: 3891683328 bytes
File list size: 72
Total bytes sent: 4194348
Total bytes received: 405243604

sent 4194348 bytes  received 405243604 bytes  239507.43 bytes/sec
total size is 4294975488  speedup is 10.49

-- > Syncing Datafile: /c03/oradata/can/ard05.dbf @ Fri Aug 26 12:14:37 EDT 2005

Then when we started recovery on the destination database, Oracle complainedabout block zero being corrupted on six (out of more than 330) of the datafiles(one at a time). All of those were small, so I just used rcp to copy them (inhot backup mode). I started having misgivings then, but continued the process ofrecovering the database and finally got to applying the next to last redo logand Oracle barfed on block corruption in one of our big datafiles.

All of the small datafiles that had block zero corrupted had a single blocktransfered via rsync. The process of opening a database and shutting it downwill cause an update to block zero, and these datafiles are not really usedduring day-to-day operation, so it fits that rsync copied one block. In fact,there are a bunch of small datafiles similarly unused that had a single blocktransfered that Oracle did not complain about.


Here is the command line I used:

rsync -ptgoHS --stats --rsh=/usr/bin/rsh -B 8192 --no-whole-file --inplace \
rmthost:${df} ${df}

I probably shouldn't have used -H, and I saw a bug report about it, but can'tbelieve it is related to my corruption problem. Is it possible -S is involvedsomehow?

The data corruption of course makes rsync useless to me for copying databases,and I'm wondering now if other things I use it for are susceptible to the sameproblem.

However, even if the corruption problem is fixed, the performance of rsync onlarge datafiles with more than a few percent of modified blocks may make it notworth using.


Any help is appreciated.

Linus
--
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Data corruption

Reply via email to