Le 13/06/2025 à 08:57, Burkhard Linke a écrit :
Hi,
On 12.06.25 21:58, Daniel Vogelbacher wrote:
Hi Eric,
On 6/12/25 17:33, Eric Le Lay wrote:
I use rsync to copy data (~10TB) to backup storage.
To speed things up I use the ceph.dir.rctime extended attribute to
instantly ignore sub-trees that haven't changed without iterating
through their contents.
I have to maintain the ceph.dir.rctime value between backups: I just
keep it in a file per top-level directory on the target storage.
This sounds interesting. Can you give some advice how to set this up
with rsync?
the rctime attribute is useful, but I wouldn't rely on it. As far as I
know it is stored in a directory inode, so each operation on a file or
directory will update the rctime on all path elements (not sure whether
this happens synchronously or asynchronously).
The problem is the fact that it is just a single value. Imagine one
rogue user or rogue host that touches a file in a subdirectory, set the
ctime to 01/01/2300, and then removes the file. Although the removal is
the last operation, setting the ctime will also update the rctime of all
path elements. And the removal if the file cannot revert this. So your
backup check will detect a last change in 01/01/2300 for the subtree and
probably performs a complete rsync. Even if all files are still the same.
rctime is fine for controlled environments without rogue elements (== no
users... ;-) ). And it can definitely be used to skip subtrees. But the
check can easily be rendered useless.
Best regards,
Burkhard
Thanks Burkhard for the warning.
Indeed there has been at least one case reported [1].
Checking for an rctime in the future during backup and raising an error
would make sense.
Otherwise, the script:
1. grabs $SRCDIR and first-level subdirectory rctimes via `find $SRCDIR
-type d -maxdepth 1 -exec getfattr -n ceph.dir.rctime '{}' ';'`
2. loads cached metadata from the target filesystem (one file per dir,
in a special $DSTDIR/.rctimes dir)
3. if $SRCDIR has the same rctime as in cached metadata you can exit early
4. for each directory in $SRCDIR if its rctime is bigger or it is not in
cached metadata, run rsync on it, save rctime
(don't re-read it but use the rctime from before the rsync in case
of file changes during rsync)
5. for each directory in target filesystem if it not in srcdir delete it
and the cached metadata
6. rsync all files directly in $SRCDIR, remove all files directly in
$DSTDIR but not in $SRCDIR
7. update cached metadata for $SRCDIR
Cheers,
[1] https://www.spinics.net/lists/ceph-users/msg75172.html
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io