Hello there,

I've been using rsync to synchronise a web cluster with a small amount
of data (about 100 to 200 megabytes), using rsh as transport, and
everything seems to work fine on it.  But when I try the same methods on
a web cluster with about 4 gigabytes of data, I start having problems
with rsync and its transport hanging.

I started out by simply copying the method from the small server to the
large one: using a wrapper script to ensure that a second rsync is not
run when the first one is copying, and using rsync as transport.  When
the process hung overnight, I used loads of -v's to get the most
information.  What I found was that it was hanging in two different
places, after messages like the following:

        make_file(4,clhc/bin/tar)
        client_run waiting on 61359

The first one can be shifted using the --timeout parameter, but all this
does is cancel the hung process--it doesn't solve the problem.  I
couldn't do anything to get rid of the second one, but I did note that
the process ID was that of an rsh process: rsync was waiting for it to
finish.  Both rsync and rsh were still running at the remote end as
well.

So I decided to get my finger out and set up ssh authorisation keys, so
I could use ssh as transport instead.  Nothing has changed, and I still
get hung processes with both errors, this time with the ssh process,
leaving rsync and ssh apparently standing around doing nothing both at
the local and remote ends.

Next I am going to try using rsync's own transport, but in the interests
of security I really would like to get the ssh transport working.  I've
had a look at the FAQ through FAQ-O-Matic, and played with --blocking-io
as it suggests, to no avail.

Does anybody have a clue as to what the problem might be?  I'd be really
grateful for any help on this as I've already stuck my neck out at my
company and championed rsync in preference to other clustering solutions
:-)

-- 
Kind Regards
Damian Walker
Unix System Administrator
Poptel Ltd.

Reply via email to