Quoting "Steven R. Gerber" <open...@gerber-systems.com>: > On 3/24/2011 4:33 PM, richardtoo...@paradise.net.nz wrote: > > Quoting "Steven R. Gerber" <open...@gerber-systems.com>: > > > >> On 3/24/2011 2:36 PM, richardtoo...@paradise.net.nz wrote: > >>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>: > >>> > >>>> -------- Original Message -------- > >>>> Subject: Re: rdist times out but will not die > >>>> Date: Thu, 24 Mar 2011 21:49:01 +1300 > >>>> From: Richard Toohey <richardtoo...@paradise.net.nz> > >>>> To: Steven R. Gerber <sger...@gerber-systems.com> > >>>> CC: t...@openbsd.org > >>>> > >>>> On 24/03/2011, at 4:06 PM, Steven R. Gerber wrote: > >>>> > >>>>> On 3/20/2011 2:07 PM, Steven R. Gerber wrote: > >>>>>> I want to do local/remote mirror/backup (or should that be > >>>> local-mirror > >>>>>> / offsite-backup). > >>>>>> So a two-part question: > >>>>>> 1. Even if there is a timeout, shouldn't the job/process exit? > >>>>>> > >>>> ************************************************************* > >>>> **************** > >>>> * > >>>>>> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies: > >> chown > >>>> from > >>>>>> rdist:operator to cdripper:operator > >>>>>> rdist@thedump: thedump: > >>>>>> > /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999: > >>>> chown > >>>>>> from rdist:operator to root:operator > >>>>>> rdist@thedump: > >>>>>> > >>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow > >>>> n_Affair_1999/THOMAS_CROW > >>>> N_AFFAIR_16X9.md5: > >>>>>> updating > >>>>>> rdist@thedump: > >>>>>> > >>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow > >>>> n_Affair_1999/THOMAS_CROW > >>>> N_AFFAIR_16X9.iso: > >>>>>> installing > >>>>>> rdist@thedump: LOCAL ERROR: Response time out > >>>>>> rdist@thedump: updating of rdist@thedump finished > >>>>>> $ ps -ax|grep rdist > >>>>>> 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20 > >>>>>> 11059 ?? I 0:00.01 rdist -f /etc/Distfile > >>>>>> 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist) > >>>>>> 7795 ?? I 1:10.32 ssh -l rdist thedump r > >>>>>> 13045 p0 S+ 0:00.00 grep rdist > >>>>>> > >>>> ************************************************************* > >>>> **************** > >>>> * > >>>>>> 2. I know that they happen from time to time. How can I > >>>> avoid/prevent > >>>>>> timeouts? The default is 900 sec AKA 15 min? How can this happen > >>>>>> between two local machines? > >>>> > >>>> How big is the file? > >>> > >>> So, how big is the file that it times out on? > >>> > >>> More than 2Gb? Guess so if a movie file? > >>> > >>> I might be barking up the wrong tree, but it will take you two > seconds > >> to see if > >>> there's anything in this > 2Gb idea and if I'm wrong, move on. > >>> > >>> Regardless of that, yes, put more debugging on - might give you > some > >> more clues. > >>> > >>> OpenBSD helps those who help themselves. > >> Richard, > >> Thanks for the help. > >> I had already read the IBM note 'LOCAL ERROR: response time out' > (from > >> 2006). (Google is not my enemy?) > >> I had already checked: the file is >2GB (4.4GB). > >> I ASSUMED that I can't the only who has tried to push large files > with > >> rdist. I searched the OpenBSD list archives (mine go back to 2006) > and > >> found nothing significant/useful. Maybe I missed something? > >> I immediately moved to the misc list per your suggestion. > >> I did a (manual) run of rdist with "-D" and got similar results -- I > am > >> still analyzing those messages. > >> I usually do not compile OpenBSD, so it will take a while to review > the > >> rdist source code (client.c?). > > > > Thanks ... never assume anything, eh? 8-) > > > > If your files are > 2Gb, then that IBM link seems to be spot on, and > answers > > (maybe) number 2 on your list - why would you get a timeout on a local > transfer > > (if hardware related, you'd expect sftp to fail, or there to be other > noticeable > > issues)? > > > > I've not used rdist before, but I don't mind having a look now that I > know your > > files are > 2Gb. But going to be a quiet (ha!) evening project, so no > promises > > (and maybe someone else will blow the theory out of the water and > provide a > > different answer/fix.) > > > > The IBM note suggests that both client & server need to be amended, IF > I am on > > the right track. > > > > This is all purely speculative on my part, but it does SEEM to match > what you > > are seeing, doesn't it? > > > > Thanks. > [SNIP] > > You are right on it! Thanks! > Not to be greedy, but ... > What do you think of the other issue that rdist logs a "finished" > message but does not exit? > > Thanks. > > More guessing (I'm already out on a limb ... the branch is about to break) ... "something" is unhappy because of the time out?
What messages are in the debug output - do you see "finish() called" as per the code in common.c below? What's the rest of the message(s)? What happens if you move all the > 2Gb files out the way temporarily and re-run (obviously I don't know how practical this is)? Does it finish normally? Or if that doesn't suit, how about creating a test directory with 20 (<2 Gb each) files in, run it, then drop a big file (>2 Gb) in, re-run. If it fails, then I'd say I was on to something (I don't know anything about rdist, so I do not know how to set up this test environment.) Remove the big file, or truncate it down to < 2Gb and re-run. If that works, I get a cookie. common.c 154 void 155 finish(void) 156 { 157 extern jmp_buf finish_jmpbuf; 158 159 debugmsg(DM_CALL, 160 "finish() called: do_fork = %d amchild = %d isserver = %d", 161 do_fork, amchild, isserver); 162 cleanup(0); 163 164 /* 165 * There's no valid finish_jmpbuf for the rdist master parent. 166 */ 167 if (!do_fork || amchild || isserver) { 168 169 if (!setjmp_ok) { 170 #ifdef DEBUG_SETJMP 171 error("attemping longjmp() without target"); 172 abort(); 173 #else 174 exit(1); 175 #endif 176 } 177 178 longjmp(finish_jmpbuf, 1); 179 /*NOTREACHED*/ 180 error("Unexpected failure of longjmp() in finish()"); 181 exit(2); 182 } else 183 exit(1); 184 } Thanks.