(Versions: OpenSSH_3.7.1p2, rsync version 2.6.2)
I've just encountered a situation where "rsync -v -n" appears to run normally,
but reports many fewer file transfers than actually get done when you remove
the -n. (This is not one of the usual "-n" corner cases.)
It turns out that this only happens when you're doing a remote rsync over ssh AND you redirect stderr into a pipe that fills up, as in
rsync -e ssh -avn host:/path /local/path 2>&1 | tee LOG
I can get the right answer by just not capturing stderr; i.e. removing the "2>&1" and just saying
rsync -avn host:/path /local/path | tee LOG
works.
The data loss occurs when the pipe (to tee here) fills, so in principle you could lose output even without the "-n", it's just less likely when the output is generated slower.
After poking around with strace, it seems that rsync's child ssh sets its stdERR non-blocking, and that stderr has been inherited unchanged from the top-level rsync. (The rsync has supplied pipes for its child's stdin and stdout, but left the stderr alone; see rsync-2.6.2/pipe.c::piped_child().)
Because of the "2>&1", the top-level stderr is a dup of the top-level stdout, so ssh has inadvertantly made rsync's stdOUT non-blocking. Rsync is not expecting that, and does not check the return code from fflush(stdout), so it can silently drop lines from stdout. (See the end of rsync-2.6.2/log.c::rwrite().)
CVS has basically the same problem, as discussed at http://groups.google.com/groups?th=e4df2fdc1f4f4950, which mentions some workarounds that the CVS people considered.
It's not clear whether the problem should really be fixed in rsync, ssh, or glibc, but in the meantime, would it be worth adding a warning to the docs/FAQ/known-issues/wherever?
-- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html