On 2012-11-02 19:32, Justin T Pryzby wrote:
On Fri, Nov 02, 2012 at 02:33:13PM +0100, Christian Iversen wrote:
However, 1 server is giving me a lot of trouble. It has a directory
with (currently) 734088 files in it, and every time I try to backup
this dir, rsync hangs after transferring roughly 2000 files.
Sometimes it's around 1800, sometimes it's over 2100 (I think), but
it's in that ballbark.

Is that using rsync:// or rsync/ssh ?  What is the commandline?

Wow, I can't believe I forgot the commandline :D

It's this:

/rsync -v -rlpt --specials --partial --chmod=Du+rwx --chmod=Fu+r --rsh='ssh -F /etc/spye/backup/main/ssh/config -i /etc/spye/backup/main/ssh/private.key -o UserKnownHostsFile=/etc/spye/backup/main/ssh/known_hosts' --files-from=/etc/spye/backup/main/include.list --exclude-from=/etc/spye/backup/main/exclude.list --link-dest=$LAST_BACKUP_DIR / $USER@$SERVER:$TARGET

There are probably two rsync processes on the server.  Could you
strace both?  How much vram are they using?  ps e -O vsz -C rsync

That was a bit tricky, since the whole process hangs shortly after being started (the files are not that big). But I managed to strace -f -p $PID1 -p $PID2 the whole thing.

And this is SUPER strange. I expected it to hang soon after starting (like it did the first couple of times), but now with strace, it seems to keep running. Slower, of course, but it seems to be running. Is there potential for a race condition here? Or does it simply take longer to get there, because of the tracing?

Anyway, it eventually DID hang, leaving this strace:

[pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999995}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999995}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999995}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999995}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999994}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999994}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999995}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999995}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999995}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999997}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [1], [1], {60, 0}) = 1 (in [5], left {59, 999996}) [pid 27310] select(6, [5], [], NULL, {60, 0}) = 1 (in [5], left {59, 999997}) [pid 27310] read(5, "\4\0\0kA*\0\0\4\0\0kB*\0\0\4\0\0kC*\0\0\4\0\0kD*\0\0"..., 8184) = 208
[pid 27310] select(6, [5], [1], [1], {60, 0}


That {59, 99999x} looks highly suspicious. rsync wouldn't happen to hold open 100000 file handles, would it? Or something to that effect?

stats with ps e:


27310 108504 T ?        00:00:18 /bin/rsync --server ...
27311 117472 S ?        00:00:03 /bin/rsync --server ...

(server has 6GB of available RAM more, so not pressed for memory exactly..)

What filesystem is it?  Can you create a tarball of the directory?

Tarball could be problematic. It's 8.6GB of data spread over 734088 files in a single dir. Yes, that many. Yes, I know it's insane. No, I didn't choose it :)

time c /path/to/dir |wc

What's "c"?

I took a guess:

$ time find usrPictures | wc -l
734088
find usrPictures  0,36s user 0,62s system 91% cpu 1,057 total
wc -l  0,02s user 0,06s system 8% cpu 1,056 total

--
De bedste hilsner,

Christian Iversen
Systemadministrator, Meebox.net

-------
Denne e-mail kan indeholde fortrolige
oplysninger. Er du ikke den rette modtager,
bedes du returnere og slette denne e-mail.
-------
--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to