Hello everyone, Image you rsynced your mp3 archive. Later you do some cleanup renaming and start splitting up the directory into a hierarchy and do some file move around.
Data-wise you did nothing, meta-data-wise you did a lot. --fuzzy comes into mind for the next rsync. Unfortunately fuzzy matching does not include other (sub-)directories and cares a little too much about modification times for this case. I was thinking about introducing a superset of the current fuzzy matching (works initially like the original, but tries more base files if nothing matched so far), and/or two new threshold values with e.g. --fuzzy-thresholds 1000:20000 where the numbers refer to the file size on the sender-side, the first meaning “below this size, don’t even consider fuzzy matching” and the second number meaning “above this size try harder to find a base file”. This could default to --fuzzy-thresholds 0:<unlimited>, the old behaviour. In case of the more aggressive search: when running out of base files with the original algorithm, try _all_ files in the destination hierarchy with just the same size, possibly sorted by Levenshtein-distance for the file name with full path. The idea is to catch simple copy/move arounds, while still keeping unreasonable base files away. Especially with bigger files, the likeliness of exact same size collisions is pretty small. The risk is: unnecessary checksum calculations with a wrong base file. If you think that risk is too high, don’t use that option... Is there a good reason why this functionality is not in rsync yet? Regards, Robert -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
