Unfortunately the hard links are the problem. In order to keep them straight rsync has to remember the details of every file it finds with a link count >1 making it grow and grow. Of course without -H rsync will end up duplicating them.
On 6/25/20 10:30 AM, Andy Smith via rsync wrote: > Hi, > > I have a virtual machine with 2G of memory. On this VM there is a > directory tree with 33.3 million files in it. When attempting to > rsync (rsync -PSHav --delete /source /dest) this tree from one > directory to another on the same host, rsync uses all the memory and > is killed by oom-killer. > > This host is Debian oldstable so has > > $ rsync --version > rsync version 3.1.2 protocol version 31 > > The normal operation of this VM does not require more than 2G of > memory, but I doubled it to 4G anyway. Unfortunately rsync still > uses all the memory and is killed. > > Most advice I can find on decreasing rsync memory usage advises to > split the job up into batches. By issuing one rsync for each > directory within /source I was able to make this work. > > The interesting thing is though, the split of file numbers between > sub-directories is very uneven with the majority of them (31.5 > million of the 33.3 million) being in just one of the sub-directory > trees. I am kind of surprised that rsync has such a problem going > just that little bit further with the last 2 million. Is there any > scope for improvement with the incremental recursion code? > > If I upgraded the version of rsync could I expect this to work any > better? > > I could also give the host a massive swap file. It currently has > just 1G of swap, which all gets used in the failure case. I could > add more but I fear that the job will go so slow it will not > complete in a reasonable time. > > I don't know if the -H option is causing extra memory usage here; > unfortunately it is necessary as there are hardlinks in there. > > Some years old advice says to disable incremental recursion with > --no-i-r. As incremental recursion was added to reduce memory usage > this seems counter-intuitive to me, but this advice is all over the > Internet… > > These are all things I will investigate before settling for the > "split into multiple jobs" approach; just wondered if anyone has any > shortcuts for me. > > Thanks, > Andy > -- ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._., Kevin Korb Phone: (407) 252-6853 Systems Administrator Internet: FutureQuest, Inc. ke...@futurequest.net (work) Orlando, Florida k...@sanitarium.net (personal) Web page: https://sanitarium.net/ PGP public key available on web site. ~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,-*~'`^`'~*-,._.,
signature.asc
Description: OpenPGP digital signature
-- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html