Is -R --link-dest really hard to use, or is it me?

2011-04-24 Thread foner-rsync
[Yes, I am reviving a thread from 27 months ago.  Why?  Because
I gave up on the problem way back then and didn't move the vault.
Now that I'm really trying to do this, it still doesn't make any
sense... :)  Matt CC'ed directly since he was the primary respondent
and I have no idea if such an old thread would otherwise be noticed.]

So, having tried your solution 1 and solution 2 (long pause while Matt
and/or others page in their state, probably by visiting something like
http://www.mail-archive.com/rsync@lists.samba.org/msg23196.html :),
I can't make either one work.

Here's a transcript of what happens.  Clearly I'm missing something.
A, B, C are hosts; 1, 2 3 are ostensibly dates; all are representations
of a dirvish vault (e.g., extensively hardlinked --link-dest backups).

The src tree started out with all files with "foo" in their names all
hardlinked together.  Ideally, the dst tree will end up likewise.
If I rsync the entire tree at once, it works fine.  But the real
use case can't do this, because the trees are enormous and there
are dozens of them.  Note that each time, not every foo* in dst
winds up with all the same inode as the rest.

Am I just up too late?

[nsn] 21:51:34 /home/blah# rsync -aviH --stats src/ dst/
sending incremental file list
.d..t.. ./
cd+ a/
cd+ a/1/
cd+ a/2/
cd+ b/
cd+ b/1/
cd+ b/2/
cd+ c/
cd+ c/1/
cd+ c/2/
>f+ c/2/foofoofoo
hf+ c/1/foofoo => c/2/foofoofoo
hf+ b/2/b-foo2 => c/2/foofoofoo
hf+ b/1/b-foo => c/2/foofoofoo
hf+ a/2/foo => c/2/foofoofoo
hf+ a/1/foo => c/2/foofoofoo

Number of files: 16
Number of files transferred: 1
Total file size: 24 bytes
Total transferred file size: 4 bytes
Literal data: 4 bytes
Matched data: 0 bytes
File list size: 255
File list generation time: 0.001 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 459
Total bytes received: 175

sent 459 bytes  received 175 bytes  1268.00 bytes/sec
total size is 24  speedup is 0.04
[nsn] 21:52:17 /home/blah# find . -ls
2241331604 drwxr-xr-x   4 root root 4096 Apr 24 21:51 .
2241331614 drwxr-xr-x   5 root root 4096 Apr 24 21:42 ./src
2241331664 drwxr-xr-x   4 root root 4096 Apr 24 21:42 ./src/b
2241331684 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./src/b/2
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 
./src/b/2/b-foo2
2241331674 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./src/b/1
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 
./src/b/1/b-foo
2241331694 drwxr-xr-x   4 root root 4096 Apr 24 21:42 ./src/c
2241331714 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./src/c/2
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 
./src/c/2/foofoofoo
2241331704 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./src/c/1
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 
./src/c/1/foofoo
2241331634 drwxr-xr-x   4 root root 4096 Apr 24 21:42 ./src/a
2241331654 drwxr-xr-x   2 root root 4096 Apr 24 21:44 ./src/a/2
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 
./src/a/2/foo
2241331644 drwxr-xr-x   2 root root 4096 Apr 24 21:43 ./src/a/1
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 
./src/a/1/foo
2241331624 drwxr-xr-x   5 root root 4096 Apr 24 21:42 ./dst
2241331754 drwxr-xr-x   4 root root 4096 Apr 24 21:42 ./dst/b
2241331804 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./dst/b/2
2241331834 -rw-r--r--   6 root root4 Apr 24 21:43 
./dst/b/2/b-foo2
2241331794 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./dst/b/1
2241331834 -rw-r--r--   6 root root4 Apr 24 21:43 
./dst/b/1/b-foo
2241331764 drwxr-xr-x   4 root root 4096 Apr 24 21:42 ./dst/c
2241331824 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./dst/c/2
2241331834 -rw-r--r--   6 root root4 Apr 24 21:43 
./dst/c/2/foofoofoo
2241331814 drwxr-xr-x   2 root root 4096 Apr 24 21:49 ./dst/c/1
2241331834 -rw-r--r--   6 root root4 Apr 24 21:43 
./dst/c/1/foofoo
2241331744 drwxr-xr-x   4 root root 4096 Apr 24 21:42 ./dst/a
2241331784 drwxr-xr-x   2 root root 4096 Apr 24 21:44 ./dst/a/2
2241331834 -rw-r--r--   6 root root4 Apr 24 21:43 
./dst/a/2/foo
2241331774 drwxr-xr-x   2 root root 4096 Apr 24 21:43 ./dst/a/1
2241331834 -rw-r--r--   6 root root4 Apr 24 21:43 
./dst/a/1/foo
[nsn] 21:52:26 /home/blah# find . -ls | grep foo
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 
./src/b/2/b-foo2
2241331724 -rw-r--r--   6 root root4 Apr 24 21:43 

checksum-xattr.diff [CVS update: rsync/patches]

2007-07-02 Thread foner-rsync
Date: Mon, 2 Jul 2007 08:43:39 -0400
From: "Matt McCutchen" <[EMAIL PROTECTED]>

> *Note that "now" for a particular disk may not be the same as time() if
> the disk is remote, so network filesystems can be rather complicated.

That's easy to fix: get your "now" by touching a file on the
filesystem and reading the resulting mtime.

Unreliable.  If you sync up at the beginning of a run, and then the
remote system executes a large clock step (e.g., because it's not
running NTP or it's misconfigured, or it is but NTP has bailed due to
excessive drift from hardware issues or a bogus driftfile (both of
which I've seen*), then "now" might glitch by a second (or more),
which is enough to break your idea of what "now" means---even a
smaller glitch can lead to races based on whose clock ticks first.
Sure, it's a low-probability event, but then, with low probability,
you have some file that isn't getting updated, which can lead to all
kinds of mysterious bugs, etc...

Seems to me the only way around this would be to do the touch before
-every- file you handle, which doubles the amount of statting going
on, etc.  And there are probably still timing windows there.

* [One of several ways I saw this happening was a motherboard that
accidentally had FSB spread-spectrum enabled, which caused the clock
to run fast.  NTP gave up slewing and started making larger and larger
steps until it was forced to bail out.  It took quite a while for this
problem to be noticed ("but the machine's running NTP!"), in part
because it took a while to manifest after each boot reset the clock.
Then, when the BIOS setting got fixed, the bad driftfile created by
NTP's valiant attempts to cope with the situation caused the clock
to misbehave in the -other- direction until the NTP conf stuff was
flushed and allowed to regenerate on its own with a working clock.]
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


checksum-xattr.diff [CVS update: rsync/patches]

2007-07-02 Thread foner-rsync
Date: Mon, 2 Jul 2007 21:18:57 -0400
From: "Matt McCutchen" <[EMAIL PROTECTED]>

The technique Wayne and I are discussing assumes only that the clock
on *each side* never steps backwards.  It compares the current mtime
and ctime on each side to the previous mtime and ctime on that side as
recorded in the cache.  Clock synchronization between the two sides is
irrelevant.

Okay, but that's still unreliable.  Backward clock steps -can- happen;
only in Multics is it (mostly) impossible (because a backwards step
would destroy the filesystem).  But since rsync probably doesn't run
on Multics... :)

Consider a much more likely scenario---an NFS server reboots.  It's
perfectly okay for it to do this at any time, and the client NFS will
recover, without informing rsync.  It's quite possible for large clock
steps to happen upon reboot, especially for machines that might run
ntpdate on boot but not ntpd during normal operation.  In that case,
you've got about a 50% chance that there might be a backwards clock
step, and this could conceivably happen between any two NFS requests...

It is true that if either side's clock steps backwards, that side
could be fooled into thinking a file hasn't changed from the cache
when it really has.  There's very little we can do about that except
tell the sysadmin to delete all the caches when he/she sets the clock
backwards.

> Seems to me the only way around this would be to do the touch before
> -every- file you handle, which doubles the amount of statting going
> on, etc.  And there are probably still timing windows there.

I don't understand this concern.  If you'd like a more formal proof
that the technique never misses a modification assuming each side's
clock runs forward (actually, just each filesystem's clock), I would
be happy to provide one.

Working out such a proof would be interesting (because it might reveal
a flaw nobody's even thought about yet), but the first order of business
might be figuring out how to reliably detect a backwards step, or how
to make sure that users understand they might be silently screwed if
one happens.  I understand that it's a fairly low probability, and
depends on some questionable configurations, but rsync is well-known
to be both reliable and deterministic.  I'd hate for something like
this to start chipping away at that reputation, even if we -are-
talking about a corner case in a performance optimization that might
not get invoked all that much.  Not that my opinion in this matters a
whit to begin with; I just thought I'd point out a possible screw case
before it actually screwed someone.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


checksum-xattr.diff [CVS update: rsync/patches]

2007-07-02 Thread foner-rsync
Date: Mon, 2 Jul 2007 21:18:57 -0400
From: "Matt McCutchen" <[EMAIL PROTECTED]>

The technique Wayne and I are discussing assumes only that the clock
on *each side* never steps backwards.

Um, and note, btw, that the pathological FSB-spread-spectrum/NTP
interaction I mentioned in my first message was causing a whole
-bunch- of backwards steps, over several months, until it was noticed.
I don't recall their magnitude, but I think it was a backwards step of
at least a second every few tens of minutes, until after quite some
time NTP simply exceeded its tolerance and punted, whereupon the clock
ran away.  But since it was -almost- holding it together, for days or
weeks at a time...

And the machine was an NFS server.  So in fact this scheme would have
been leading to a whole bunch of sporadic "why was this cache
inaccurate?" failures for a long time if rsync had been using this
strategy and someone had been using it against that server.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


--hard-links performance

2007-07-11 Thread foner-rsync
Date: Wed, 11 Jul 2007 01:26:18 -0400
From: "George Georgalis" <[EMAIL PROTECTED]>

the program is http://www.ka9q.net/code/dupmerge/
there are 200 lines of well commented C; however
there may be a bug which allocates too much memory
(one block per file); so my application runs out. :\
If you (anyone) can work it out and/or bring it into
rsync as a new feature, that would be great. Please
keep the author and myself in the loop!

Do a search for "faster-dupemerge"; you'll find mentions of it in the
dirvish archives, where I describe how I routinely use it to hardlink
together filesystems in the half-terabyte-and-above range without
problems on machines that are fairly low-end these days (a gig of RAM,
a gig or so of swap, very little of which actually gets used by the
merge).  Dirvish uses -H in rsync to do most of the heavy lifting, but
large movements of files from one directory to another between backups
won't be caught by rsync*.  So I follow dirvish runs with a run of
faster-dupemerge across the last two snapshots and across every
machine being backed up (e.g., one single run that includes two
snapshots per backed-up machine); that not only catches file movements
within a single machine, but also links together backup files -across-
machines, which is quite useful when you have several machines which
share a lot of similar files (e.g., the files in the distribution
you're running), or if a file moves from one machine to another, etc,
and saves considerable space on the backup host.  [You can also trade
off speed for space, e.g., since the return on hardlinking zillions of
small files is relatively low compared to a few large ones, you can
also specify "only handle files above 100K" or whatever (or anything
else you'd like as an argument to "find") and thus considerably speed
up the run while not losing much in the way of space savings; I
believe I gave some typical figures in one my posts to the dirvish
lists.  Also, since faster-dupemerge starts off by sorting the results
of the "find" by size, you can manually abort it at any point and it
will have merged the largest files first.]

http://www.furryterror.org/~zblaxell/dupemerge/dupemerge.html is the
canonical download site, and mentions various other approaches and
their problems.  (Note that workloads such as mine will also require
at least a gig of space in some temporary directory that's used by the
sort program; fortunately, you can specify on the command line where
that temp directory will be, and it's less than 0.2% of the total
storage of the filesytem being handled.)

* [Since even fuzzy-match only looks in the current directory, I
believe, unless later versions can be told to look elsewhere as well
and I've somehow missed that---if I -have- missed that, it'd be a nice
addition to be able to specify extra directories (and/or trees) in
which fuzzy-match should look, although in the limit that might
require a great deal of temporary space and run slowly.]
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Rsync shouldn't display a meaningless speedup on a dry run

2007-11-06 Thread foner-rsync
> Date: Mon, 05 Nov 2007 13:17:32 -0500
> From: Matt McCutchen <[EMAIL PROTECTED]>

> I think rsync should omit the speedup on a dry run.  The attached patch
> makes it do so.

I worry about those trying to write things that parse rsync's output;
if -n changes the output format, such things will have to be tested on
live data.

Is it possible (e.g., without ridiculous amounts of code-massaging) to
have -n output the speedup (or some more-reasonable estimate) anyway?
Sure, all kinds of differences haven't been computed, but...  Or maybe
just have it report a speedup of 1.00 instead?  Still misleading, but
it preserves the output format and is trivial to write (but still,
alas, confusing for the user, so this doesn't fill me with glee).

Or we can just assume that such parsers might be looking at the file
list, but it's dubious that applications exist that care about the
speedup data and hence would be throwing away such lines anyway (and
would not break if it doesn't appear).
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Rsync shouldn't display a meaningless speedup on a dry run

2007-11-06 Thread foner-rsync
Date: Tue, 06 Nov 2007 23:18:08 -0500
From: Matt McCutchen <[EMAIL PROTECTED]>

On Tue, 2007-11-06 at 22:22 -0500, [EMAIL PROTECTED] wrote:
> I worry about those trying to write things that parse rsync's output;
> if -n changes the output format, such things will have to be tested on
> live data.

No, just run rsync's output through a sed script that adds the desired
speedup to the last line.

That changes the test setup quite a lot with and without -n.

> Is it possible (e.g., without ridiculous amounts of code-massaging) to
> have -n output the speedup (or some more-reasonable estimate) anyway?
> Sure, all kinds of differences haven't been computed, but...

Rsync could estimate an upper bound on how much a real run might send by
adding the size of the data that wasn't transferred (regular file data
and abbreviated xattrs) to the amount the dry run sent, but I'm not sure
the resulting value would be useful enough to make this worthwhile.

I could go either way.

> Or maybe
> just have it report a speedup of 1.00 instead?  Still misleading, but
> it preserves the output format and is trivial to write (but still,
> alas, confusing for the user, so this doesn't fill me with glee).

That lie would be no improvement over the current one.

Then how about this:  If your patch winds up in rsync, it requires a
patch to the manpage entry for -n that says, essentially, "You can't
trust the actual information emitted when running with -n to match
what gets emitted if you haven't specified -n.  Therefore, if you're
writing things that parse rsync's output, you must ensure that your
script works with and without -n.  Here is an itemization of those
things that might be different in its output with and without -n:
  (a) With -n, the speedup line will be omitted.
  (b) ?"
Etc.

At least that way, someone writing such a tool will be warned without
having to find out the hard way.

(I don't write such tools, but I've certainly seen some, and read some
chatter about them on this list.)
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


remote logging non-daemon mode

2007-12-05 Thread foner-rsync
> Date: Wed, 5 Dec 2007 23:21:27 -0500
> From: "Doug Lochart" <[EMAIL PROTECTED]>

> Each module needs to be protected from the others so if a user logs in 
with
> their credentials they should not have access to any other module.   It
> would take a user knowing the name of another client to affect the 
security
> breach.  I admit I am no whiz at securing the rsync server.  Once we had 
it
> setup to run in daemon mode we assumed the ssh tunnels would provide all
> that we need.  We over looked this one issue however.

Are users supposed to be running any arbitrary rsync command they like
when they connect, or is there a canonical one for doing the backup?

If the latter, can you use ssh's "forced command" mode, with a
different command associated with each user?

Hmm.  I just did a search and found this, from two months ago:
http://www.mail-archive.com/rsync@lists.samba.org/msg19657.html

Relevant?
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Is -R --link-dest really hard to use, or is it me?

2009-01-11 Thread foner-rsync
I've got a problem for which the combination of -R and --link-dest
doesn't seem to be quite enough---and I may have discovered a few
small bugs as well; test cases are below.

[And if someone has a scheme for doing this that doesn't involve rsync
at all, but works okay, I'm all ears as well---I'm not the first with
this problem.]

Here's my problem:  I unfortunately need to move a large dirvish
vault.  This is a directory tree consisting of -many- hardlinked
files, which means that moving it in pieces will copy many times more
data than is actually there, but trying to move the entire thing in
one shot consumes more RAM than is available.  [rsync on the toplevel
dir blew up almost immediately, as I expected.  cp -a was consuming at
least 130meg per snapshot and therefore looked likely to consume at
least 10G of RAM to finish; it's actually possible for other reasons
it might have been closer to 20G.  It thus got slower and slower as
it became more and more page-bound and I eventually got tired of it
thrashing itself to death; ETA might have been a few weeks at that
rate.  I can't just move the underlying blocks (e.g., copy the
partition as a partition) because the whole reason I'm moving this
filesystem in the first place is because it has errors that fsck is
having trouble fixing---bug or bad hardware isn't established yet.
And I don't know if dump/restore works well on ext3 filesystems, is
well-tested these days, will work for ext4 when I finally migrate to
that, or produces good data if the filesystem I'm starting with has
errors that fsck complains about (or if it, too, will consume enormous
amounts of RAM, but I'm assuming it's not trying to cache every inode
it dumps, so maybe that might work if I trusted it---opinions
anyone?)]

So---rsync to the rescue, except not.  A normal dirvish backup just
uses --link-dest against the previous host/date combo, and works fine.
I could copy the entire set of snapshots to a new filesystem the same
way, EXCEPT for a problem:  I took pains to hardlink files -across-
hosts' backups that were also the same, so I didn't have a zillion
copies of the same files that are all shared by most releases and any
linux anyway.  E.g., in this sort of arrangement:
  hostA/20080101
  hostB/20080101
  ...
  hostF/20080101
  ...
  hostA/20080102
  hostB/20080102
  ...
  hostF/20080102
  ...

dirvish (well, rsync) itself hardlinked files between hostA/20080101
and hostA/20080102 on successive runs, and then -I- ran a tool
(faster-dupemerge) that hardlinked identical files between
hostA/20080101 and hostB/20080101 (etc).  Once this is done across the
very first set of dumps (e.g., 20080101 in this example), then even
though rsync is doing --link-dest only from hostA to hostA on
successive runs, everything stays hardlinked together across hosts
because the same inode is being reused everywhere.  (I also run
faster-dupemerge across all hosts for the most-recent pair of backups
to catch files that have been -copied or moved-, either from one dir
to another on the same host, or across hosts.  Works great.)

Unfortunately, I can't get rsync to do the right thing when I'm trying
to copy this structure.  What I'd -like- to do is to take all of
hostA..hostF---for a single date---and copy them all at once, using
--link-dest to point back at the previous date's set of hosts as the
basis.  But because of the way the directories are structured, I need
to use -R so I get the same structure recreated, and that seems to
break --link-dest, unless there's some syntax issue in what I'm doing.

Small test case:

Imagine that "src" is my original filesystem, and "dst" is where I'm
trying to move things.  (Here, they share a superior directory, but of
course in real life they're different filesystems.)  "foo" is my test
file; there are multiple copies of it in src that are all hardlinked
together.  I've already done the push of the first vault's contents
from src to dst, so --link-dest has something to work with; note that
the inode numbers for foo in src and dst are different (since, again,
in real life, they're on different filesystems), but that all copies
of foo in either src or dst (so far) share the same inode.  The A, B,
and C directories correspond to individual hosts.

18:45:42 ~/H$ find . -name "foo" -ls
 844204 -rw-r--r--   2 blahblah   4 Jan 11 18:43 ./src/a/1/foo
 844204 -rw-r--r--   2 blahblah   4 Jan 11 18:43 ./src/a/2/foo
 844264 -rw-r--r--   1 blahblah   4 Jan 11 18:43 ./dst/a/1/foo
18:45:46 ~/H$ ~/rsync-3.0.5/rsync -aviH --link-dest=../1 src/a/2/ dst/a/2/
sending incremental file list
created directory dst/a/2
cd..t.. ./

sent 61 bytes  received 15 bytes  152.00 bytes/sec
total size is 4  speedup is 0.05
18:46:11 ~/H$ find . -name "foo" -ls
 844204 -rw-r--r--   2 blahblah   4 Jan 11 18:43 ./src/a/1/foo
 844204 -rw-r--r--   2 blahblah   4 Jan 11 18:43 ./src/a/2/foo
 844264 -rw-r--r--   2 blahblah  

Is -R --link-dest really hard to use, or is it me?

2009-01-26 Thread foner-rsync
> Date: Sun, 25 Jan 2009 01:02:15 -0500
> From: Matt McCutchen 

> I regret the slow response.  I was interested in your problem, but I
> knew it would take me a while to respond thoughtfully, so I put the
> message aside and didn't get back to it until now.  I hope this is still
> useful.

Yes, it is.  Thanks.

[The immediate need to move the filesystem is gone because the
underlying hardware problem has been solved, but eventually I'm
going to want to migrate this ext3 to ext4, and the problem will
recur at that point.  Besides, I'm not the only one who might need
to move such extensively-hardlinked filesystems.]

> > Okay, so the above shows that --link-dest without -R appears to work, 
BUT---
> > how come there was no actual output from rsync when it created 
dst/a/2/foo?
> > Correct side-effect (foo created, with correct inode), but incorrect 
output.

> The lack of output here is by design.  That's not to say that I think
> the design is a good one.

I have to confess that I don't, either.  (...but see below.)

> [ . . . ]

> However, the more recently added --copy-dest and --link-dest:

> [ . . . ]

> have the IMHO more useful interpretation that the basis dir is to be
> used as an optimization (of network traffic and/or destination disk
> usage), without affecting either the itemization or the final contents
> of the destination.  I entered an enhancement request for this to be
> supported properly:

> https://bugzilla.samba.org/show_bug.cgi?id=5645

I see where you're going with that; I assume that such an enhancement
would, as fallout, cause itemization of created hardlinks when using
a --dest arg.  (Right now, they're itemized in a "normal" run with -H
but without a --dest, but don't appear if --dest is added, which looks
to someone who hasn't followed the entire history like a bug---and
makes the output less useful, too.)

...though on the other hand, would this dramatically clutter up the
output of a "normal" --link-dest where, typically, one is looking to
see which -new- files got transferred as opposed to seeing the
creation of a zillion files that were in the basis dirs?  (Since
you seem to advocate two different options, I guess that would allow
users to decide either way.)

> [ . . . ]

> Right.  To recap the problem: In order to transfer both b/2/ and c/2/ to
> the proper places under dst/ in a single run, you needed to include the
> "b/2/" and "c/2/" path information in the file list by using -R.  But
> consequently, rsync is going to look for b/2/foo and c/2/foo under
> whatever --link-dest dir you specify, and there's no directory on the
> destination side that contains files at those paths (yet).

So you're saying that there appears to be no way to tell rsync what I
want to do in this case---I haven't missed something, and it's either
a limitation or a design goal that it works this way.  Correct?
[Err, except that perhaps you have a solution below; it's just that
-R is pretty much useless with any of the --*-dests.]

> Tilde expansion is the shell's job.

Right, I realized what was going on just after I sent the mail.
(I was concentrating on the real problem at hand, of course, and
missed that I'd put an = in there, defeating the shell; attributing
tilde expansion to anything but the shell must have meant I'd been
awake too long. :)

> I think using a separate rsync run for each hostX/DATE dir is the way to
> go since it's easy to specify an appropriate --link-dest dir, or more
> than one.  With this approach, you don't need -H unless you want to
> preserve hard links among a single host's files on a single day.

I do need -H for that reason (there are many crosslinked files in any
individual source host---not just in the dirvish vault), but
unfortunately doing a separate run for each hostX/DATE combination
isn't enough either, which is how I got into this problem---the reason
is that there are crosslinks -across- the hosts that I -also- want to
preserve.  Although perhaps your suggestion below is the solution.

(How did this happen?  Because after each date's backups, I run
faster-dupemerge across all hosts (and across the previous date's
run), all at once, e.g. 6 hosts times 2 dates, in my example.  This
merges files that are the same across hosts [distribution-related
stuff, mostly] and also catches files that moved across directories or
across hosts---oh, whoops, I just realized I mentioned this the first
time, but it bears repeating 'cause it's why this is an unusual case.
Not having rsync catch this when I'm copying this giant hierarchy to a
new filesystem would undo the work unless I ran f-d on the copy as it
was being created, which would increase the time to move everything by
quite a lot.)

> In recent months, several rsnapshot users have posted about migration
> problems similar to yours but one-dimensional (dates only), and I wrote
>

Malformed Address and Private IP issue

2006-03-08 Thread foner-rsync
Date: Wed, 8 Mar 2006 17:15:36 -0800
From: Wayne Davison <[EMAIL PROTECTED]>

On Wed, Mar 08, 2006 at 01:48:37PM -0800, Jonathan Chen wrote:
> 2006/03/08 11:25:12 [16976] malformed address localhost.localdomain

That can't be 2.6.6 because 2.6.6 doesn't have an error message of that
format.  In 2.6.6, the old "malformed address" error now outputs as
"error matching address" and includes the gai_strerror() text after a
colon.  A new error in 2.6.6 that does include the string "malformed
address" would have also included the gai_strerror() text after a colon.
Thus, that's still the old rsync running.  Perhaps it didn't really get
stopped?  Or perhaps it is running via inetd?

Given how often rsync versions change and how much functionality goes
into each new one (yay!), I wonder if it might not be such a bad idea
to have the rsync version embedded in every error message?  With most
programs, it's likely that the user knows at least something about the
version they're running, but since rsync is almost always run with one
of the instantiations remote, it might make debugging easier if the
message was explicit...
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Data Encryption

2006-06-12 Thread foner-rsync
Date: Mon, 12 Jun 2006 14:18:00 -0400
From: Matt McCutchen <[EMAIL PROTECTED]>

On Mon, 2006-06-12 at 10:58 -0700, Chuck Wolber wrote:
> On Mon, 12 Jun 2006, Brad Farrell wrote:
> 
> > Is there a way with rsync to encrypt data at the source before 
> > transmitting? Not talking about the actually transmission, but the data 
> > itself.  I've got a few department heads that want their data secured 
> > before it leaves their computer so that no one in the office can access 
> > the data except for them.

Then again, the data is saved decrypted on the destination disk.  Maybe
the files should be individually encrypted with a symmetric algorithm on
the source before transmission.  This could be done with either a script
or the --source-filter option provided by an experimental rsync patch.

Note that typical encryption algorithms prevent incremental transfer
from identifying unchanged portions of a file; rsyncrypto does not but
I'm not sure I trust its security.

The right solution is probably to run an encrypted filesystem on the
machine that holds the backups, and of course to use ssh getting the
files there.  That way, rsync's incremental algorithm is actually of
some use.  While you're at it, put an encrypted filesystem on the
machines the data is coming -from-, too, especially if they're
laptops.  Machines do get stolen.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Data Encryption

2006-06-12 Thread foner-rsync
Date: Mon, 12 Jun 2006 18:01:34 -0400
From: Matt McCutchen <[EMAIL PROTECTED]>

On Mon, 2006-06-12 at 17:51 -0400, [EMAIL PROTECTED] wrote:
> The right solution is probably to run an encrypted filesystem on the
> machine that holds the backups, and of course to use ssh getting the
> files there.

That isn't enough if the department heads don't trust the backup machine
to transfer the data to the encrypted volume without peeking at it in
the process.

True.  In that case, they have no choice but to encrypt locally
---or pick a different backup organization that they -do- trust.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


Problem with shared xls file. Could it be blamed on rsync?

2007-03-16 Thread foner-rsync
Date: Fri, 16 Mar 2007 02:30:33 -0700 (PDT)
From: syncro <[EMAIL PROTECTED]>

Thanks alot! That's what I wanted to hear ;)
We want to have an always-up-to-date-copy thus rsync every minute and not
just at night. However my preventive measure will be a forbiddance of
sharing xls files or the like.

Rather than forbidding sharing, maybe you could ask rsync (via
files-from and a filter or something) to only back up files haven't
been modified in the last 10 minutes?  I don't know exactly when
Windows might update the file's timestamp vs when data starts getting
written to it---and there will always be a tiny timing race anyway
since the scan of the filesystem and the start of the update aren't
simultaneous---but it might be the file gets backed up every few
minutes when people aren't actively working on it.

The other solution might be to have Windows copy the file to a
temporary location (since Windows might respect its own locks), and
then back up the temporary copy.
-- 
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html