On 3/24/2011 5:00 PM, richardtoo...@paradise.net.nz wrote:
> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
> 
>> On 3/24/2011 4:33 PM, richardtoo...@paradise.net.nz wrote:
>>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
>>>
>>>> On 3/24/2011 2:36 PM, richardtoo...@paradise.net.nz wrote:
>>>>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
>>>>>
>>>>>> -------- Original Message --------
>>>>>> Subject: Re: rdist times out but will not die
>>>>>> Date: Thu, 24 Mar 2011 21:49:01 +1300
>>>>>> From: Richard Toohey <richardtoo...@paradise.net.nz>
>>>>>> To: Steven R. Gerber <sger...@gerber-systems.com>
>>>>>> CC: t...@openbsd.org
>>>>>>
>>>>>> On 24/03/2011, at 4:06 PM, Steven R. Gerber wrote:
>>>>>>
>>>>>>> On 3/20/2011 2:07 PM, Steven R. Gerber wrote:
>>>>>>>> I want to do local/remote mirror/backup (or should that be
>>>>>> local-mirror
>>>>>>>> / offsite-backup).
>>>>>>>> So a two-part question:
>>>>>>>> 1.     Even if there is a timeout, shouldn't the job/process exit?
>>>>>>>>
>>>>>> *************************************************************
>>>>>> ****************
>>>>>> *
>>>>>>>> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies:
>>>> chown
>>>>>> from
>>>>>>>> rdist:operator to cdripper:operator
>>>>>>>> rdist@thedump: thedump:
>>>>>>>>
>> /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999:
>>>>>> chown
>>>>>>>> from rdist:operator to root:operator
>>>>>>>> rdist@thedump:
>>>>>>>>
>>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
>>>>>> n_Affair_1999/THOMAS_CROW
>>>>>> N_AFFAIR_16X9.md5:
>>>>>>>> updating
>>>>>>>> rdist@thedump:
>>>>>>>>
>>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
>>>>>> n_Affair_1999/THOMAS_CROW
>>>>>> N_AFFAIR_16X9.iso:
>>>>>>>> installing
>>>>>>>> rdist@thedump: LOCAL ERROR: Response time out
>>>>>>>> rdist@thedump: updating of rdist@thedump finished
>>>>>>>> $ ps -ax|grep rdist
>>>>>>>> 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20
>>>>>>>> 11059 ?? I 0:00.01 rdist -f /etc/Distfile
>>>>>>>> 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist)
>>>>>>>> 7795 ?? I 1:10.32 ssh -l rdist thedump r
>>>>>>>> 13045 p0 S+ 0:00.00 grep rdist
>>>>>>>>
>>>>>> *************************************************************
>>>>>> ****************
>>>>>> *
>>>>>>>> 2.     I know that they happen from time to time. How can I
>>>>>> avoid/prevent
>>>>>>>> timeouts? The default is 900 sec AKA 15 min? How can this happen
>>>>>>>> between two local machines?
>>>>>>
>>>>>> How big is the file?
>>>>>
>>>>> So, how big is the file that it times out on?
>>>>>
>>>>> More than 2Gb? Guess so if a movie file?
>>>>>
>>>>> I might be barking up the wrong tree, but it will take you two
>> seconds
>>>> to see if
>>>>> there's anything in this > 2Gb idea and if I'm wrong, move on.
>>>>>
>>>>> Regardless of that, yes, put more debugging on - might give you
>> some
>>>> more clues.
>>>>>
>>>>> OpenBSD helps those who help themselves.
>>>> Richard,
>>>> Thanks for the help.
>>>> I had already read the IBM note 'LOCAL ERROR: response time out'
>> (from
>>>> 2006). (Google is not my enemy?)
>>>> I had already checked: the file is >2GB (4.4GB).
>>>> I ASSUMED that I can't the only who has tried to push large files
>> with
>>>> rdist. I searched the OpenBSD list archives (mine go back to 2006)
>> and
>>>> found nothing significant/useful. Maybe I missed something?
>>>> I immediately moved to the misc list per your suggestion.
>>>> I did a (manual) run of rdist with "-D" and got similar results -- I
>> am
>>>> still analyzing those messages.
>>>> I usually do not compile OpenBSD, so it will take a while to review
>> the
>>>> rdist source code (client.c?).
>>>
>>> Thanks ... never assume anything, eh? 8-)
>>>
>>> If your files are > 2Gb, then that IBM link seems to be spot on, and
>> answers
>>> (maybe) number 2 on your list - why would you get a timeout on a local
>> transfer
>>> (if hardware related, you'd expect sftp to fail, or there to be other
>> noticeable
>>> issues)?
>>>
>>> I've not used rdist before, but I don't mind having a look now that I
>> know your
>>> files are > 2Gb. But going to be a quiet (ha!) evening project, so no
>> promises
>>> (and maybe someone else will blow the theory out of the water and
>> provide a
>>> different answer/fix.)
>>>
>>> The IBM note suggests that both client & server need to be amended, IF
>> I am on
>>> the right track.
>>>
>>> This is all purely speculative on my part, but it does SEEM to match
>> what you
>>> are seeing, doesn't it?
>>>
>>> Thanks.
>> [SNIP]
>>
>> You are right on it! Thanks!
>> Not to be greedy, but ...
>> What do you think of the other issue that rdist logs a "finished"
>> message but does not exit?
>>
>> Thanks.
>>
>>  
> More guessing (I'm already out on a limb ... the branch is about to break) ...
> "something" is unhappy because of the time out?
> 
> What messages are in the debug output - do you see "finish() called" as per 
> the
> code in common.c below?  What's the rest of the message(s)?
> 
> What happens if you move all the > 2Gb files out the way temporarily and 
> re-run
> (obviously I don't know how practical this is)?  Does it finish normally?
> 
> Or if that doesn't suit, how about creating a test directory with 20 (<2 Gb
> each) files in, run it, then drop a big file (>2 Gb) in, re-run.  If it fails,
> then I'd say I was on to something (I don't know anything about rdist, so I do
> not know how to set up this test environment.)  Remove the big file, or 
> truncate
> it down to < 2Gb and re-run.  If that works, I get a cookie.
> 
> common.c
> 
>     154 void
>     155 finish(void)
>     156 {
>     157         extern jmp_buf finish_jmpbuf;
>     158
>     159         debugmsg(DM_CALL,
>     160                  "finish() called: do_fork = %d amchild = %d isserver 
> = %d",
>     161                  do_fork, amchild, isserver);
>     162         cleanup(0);
>     163
>     164         /*
>     165          * There's no valid finish_jmpbuf for the rdist master parent.
>     166          */
>     167         if (!do_fork || amchild || isserver) {
>     168
>     169                 if (!setjmp_ok) {
>     170 #ifdef DEBUG_SETJMP
>     171                         error("attemping longjmp() without target");
>     172                         abort();
>     173 #else
>     174                         exit(1);
>     175 #endif
>     176                 }
>     177
>     178                 longjmp(finish_jmpbuf, 1);
>     179                 /*NOTREACHED*/
>     180                 error("Unexpected failure of longjmp() in finish()");
>     181                 exit(2);
>     182         } else
>     183                 exit(1);
>     184 }
> 
> Thanks.
> 
> 
> 

I am getting the "finished() called" etc.
I now have a theory (your "something" unhappy guess): rdist times out,
but the child process does not and is still trying to get the
end-of-file.  The child is basically in an infinite loop: it does not
time out because the dump does respond but it keeps retrieving from the
first part of file -- it never reaches past the miscalculated size.

Reply via email to