> Date: Wed, 20 Jan 2016 23:04:20 -0800 > From: Wayne Davison <way...@samba.org>
> On Thu, Dec 25, 2014 at 2:48 AM, Ingo Br=C3=BCckl <i...@wupperonline.de> wrote: > > On systems using nanoseconds differences should be taken into > > consideration. > The problem is that if you transfer from a filesystem that has nanoseconds > to one that does not support it, rsync would consider most of the files to > be constantly different, since the nanosecond values would only match if > the source file happened to have 0 nanoseconds. So, the logic has to be > improved to somehow detect such a case and treat the truncated values as > equal. One possible improvement would be to skip the nanosecond check if > the destination file has a nanosecond value of 0. That could possibly be > improved if we figure out if a particular device ID supports nanoseconds > somehow. I have a potential heuristic in mind that I can code up and see > how it works. Here's one idea, and note an important issue with ns times and --link-dest: (a) For each end, see if any of the files being considered already have nonzero nanosecond parts. If so, then that end of the transfer supports nanosecond timing. (b) If the sending filesystem appears to have nonzero ns parts, and the receiving filesystem appears to have all-zero ns parts (including any directories under consideration), the receiver may still support ns times, but have been synchronized from a filesystem that didn't. We don't want to perpetuate that on the -next- sync, however, so we can't just disallow ns times on the receiver, or we'll never try them again. (c) In case (b) above, therefore, if any file to be transmitted has a ns time, transfer it and then immediately check the received file's timestamp. If its ns time is still zero, then the receiving filesystem doesn't support it, so disable ns times during the transfer. If it's nonzero, then enable. (I am eliding the pipelining that happens during an actual rsync; that may have to be dealt with somehow.) Also, check the directory mod time, and see if -that's- now nonzero; you have a very small chance of it being zero if ns times are supported, and you can check for being in or near that window. And the first time it's nonzero in this filesystem, you know it'll work for everything else in this fs.* * "This filesystem" assumes either that you can detect mountpoints, or that the heuristic should be applied per-directory, and that no directory has a single-file mountpoint that doesn't support it, etc. I assume rsync must already have some sort of logic like this for dealing with xattr support per-fs, etc. If this is flaky to do, then you might need --[no]ns-timing switches to force rsync to do the right thing without complaining on every single file if it guesses wrong. I don't know if the rsync protocol is flexible enough to dynamically enable or disable this capability partway through a transfer. If it isn't, then there's an even more hackish approach, which is to add, and unconditionally attempt to honor, a --ns-times-valid sort of switch. Users can then use the heuristics above in a dummy transfer to know whether to set that switch for the real transfer. (Or they may know out-of-band that their FS supports ns times.) But I'd think such a switch and workaround should be last resorts. I would really like to see ns times supported. I use dirvish to back up filesystems, which uses rsync, and if I ever have to restore any files from that (which I do more often due to accidentally deleting or bashing a file than due to media failure), I lose the ns timestamps, and they're sometimes extremely valuable forensically when I'm trying to debug something else. Having them be 0 when I thought they shouldn't be has more than once cost me time until I realized that I'm looking at files that were rsync'ed from another host (either to duplicate a setup, or from a backup) and rsync didn't preserve the ns times. Unfortunately, of course, if rsync gets fixed now, it -will- consider every single backed-up file in my dirvish vaults to be "new" and will insanely bloat the vault (doubling its size) the next time it runs, and then I'll have to tell faster-dupemerge how to re-merge all that stuff, too. (After all, even if the file contents haven't changed, its metadata has, so --link-dest is required to create a new copy of the file rather than hardlink to one with a different timestamp.) What I'd -really- like is for some sane interaction with --link-dest as well (which probably requires another switch, alas), which basically says "a change from ns-0 to ns-other with no other changes to the file is considered the same file---update the timestamp to the new ns time but don't break the hardlink", with a way of forcing that off for people who aren't in my situation and do care about such a change. Failing that, I'd need to do something like (a) run a backup in non-ns mode by force, then (b) immediately re-run the backup in ns-mode -on the same output directories-, e.g., -not- using --link-dest to create new dirvish vaults. This should get the times resynchronized without breaking all the hardlinks to the previous backups. (I suspect that this would force ns times into files dozens of generations back in the vaults, since those hardlinks would all share metadata, but that's okay and in fact desireable.) Note that this change in rsync behavior would thus appear to need a pretty big warning in the changelog and new-version announcements warning people that those who use --link-dest (which I assume means by-hand, via dirvish, and via rsnapshot, at least) need to make some sort of workaround (TBD) so as not to have their backups suddenly explode in both time and space. I -still- think I'd like to see ns times in rsync, despite this caveat---the longer it's delayed, the worst the situation gets. (A coordinated change in the most-popular tools that use --link-dest to implement a workaround or at least warn the user also seems wise; otherwise, those who upgrade their OS and get a new version of rsync that way, without reading release notes, may be surprised. Which means such tools need a way of knowing which rsync implements ns times, presumably by adding it to the "Capabilities" output of --version or something. Unless, of course, the ns-0-to-ns-other-means-same-file-for-link-dest is the default, which I think is what I'd recommend, as long as there's a way to turn that off and it's well-documented.) P.S. The current situation also means that faster-dupemerge can't use that information, either, because I can't trust it to be correct across hosts in such situations. [I made a version of f-d that respected ns times, only to abandon it when I realized that rsync wasn't preserving them!] I merge -across- vaults with f-d to catch files that are the same on multiple backed-up hosts, or to catch pushing a file from one host to another and deleting it from the original host, or to merge identical files on same host in the backup even if they aren't merged on the host being backed up. [Paul Slootman's request for FAT filesystems would be a generalization of this sort of strategy, although I'd think that in that case it's a lot more obvious to the user invoking rsync that the fs is FAT.] -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html