Hi, On Wed, Oct 09, 2024 at 10:57:12AM +0200, Michel Verdier wrote: > On 2024-10-08, Andy Smith wrote: > > > When you have hundreds of millions of files in rsnapshot it really > > starts to hurt because every backup run involves: > > > > - Deleting the oldest tree of files; > > rsnapshot can rename it apart and delete it after backup is done. Thus > involving only the backup system
Yes but this is still a necessary part of each backup cycle. You can't do another backup run while that job is still outstanding and the load it puts on the system is still there regardless of the timing within the backup procedure. > > - Walking the entire tree of the most recent backup once to cp -l it and > > then; > > rsnapshot only renames directories when rotating backups then does rsync > with hard links to the newest Okay yes when you set link_dest to 1 in rsnapshot.conf then rsync will do that bit during its run, but having to hard link a directory tree of 5 million files is not speedy. Other backup designs do not do this because they don't need to take any form of copies of what is already there. The point is that this step is "compare and hard link if unchanged" whereas usually it is "compare and do nothing if unchanged". > rsync uses metadata so it also depends on the filesystem. Some are > quicker. I think metadata is quite like the index used by other backup > systems. The big difference is that to read the metadata of a tree of files in the filesystem you have to walk through all the inodes which is a lot of small random access. 70 years of database theory has tried to make queries efficient and minimise random access, maximise cache locality etc. Otherwise all databases would just be filesystems! Like I say I like and use rsnapshot in some places, but speed and resource efficiency are not its winning points. Thanks, Andy -- https://bitfolk.com/ -- No-nonsense VPS hosting

