On 3/24/2011 10:06 PM, richardtoo...@paradise.net.nz wrote:
> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
> 
>> On 3/24/2011 5:00 PM, richardtoo...@paradise.net.nz wrote:
>>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
>>>
>>>> On 3/24/2011 4:33 PM, richardtoo...@paradise.net.nz wrote:
>>>>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
>>>>>
>>>>>> On 3/24/2011 2:36 PM, richardtoo...@paradise.net.nz wrote:
>>>>>>> Quoting "Steven R. Gerber" <open...@gerber-systems.com>:
>>>>>>>
>>>>>>>> -------- Original Message --------
>>>>>>>> Subject: Re: rdist times out but will not die
>>>>>>>> Date: Thu, 24 Mar 2011 21:49:01 +1300
>>>>>>>> From: Richard Toohey <richardtoo...@paradise.net.nz>
>>>>>>>> To: Steven R. Gerber <sger...@gerber-systems.com>
>>>>>>>> CC: t...@openbsd.org
>>>>>>>>
>>>>>>>> On 24/03/2011, at 4:06 PM, Steven R. Gerber wrote:
>>>>>>>>
>>>>>>>>> On 3/20/2011 2:07 PM, Steven R. Gerber wrote:
>>>>>>>>>> I want to do local/remote mirror/backup (or should that be
>>>>>>>> local-mirror
>>>>>>>>>> / offsite-backup).
>>>>>>>>>> So a two-part question:
>>>>>>>>>> 1.   Even if there is a timeout, shouldn't the job/process exit?
>>>>>>>>>>
>>>>>>>> *************************************************************
>>>>>>>> ****************
>>>>>>>> *
>>>>>>>>>> rdist@thedump: thedump: /mnt/mirror2/public/read_only/movies:
>>>>>> chown
>>>>>>>> from
>>>>>>>>>> rdist:operator to cdripper:operator
>>>>>>>>>> rdist@thedump: thedump:
>>>>>>>>>>
>>>> /mnt/mirror2/public/read_only/movies/The_Thomas_Crown_Affair_1999:
>>>>>>>> chown
>>>>>>>>>> from rdist:operator to root:operator
>>>>>>>>>> rdist@thedump:
>>>>>>>>>>
>>>>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
>>>>>>>> n_Affair_1999/THOMAS_CROW
>>>>>>>> N_AFFAIR_16X9.md5:
>>>>>>>>>> updating
>>>>>>>>>> rdist@thedump:
>>>>>>>>>>
>>>>>>>> /mnt/stripe2/public/read_only/movies/The_Thomas_Crow
>>>>>>>> n_Affair_1999/THOMAS_CROW
>>>>>>>> N_AFFAIR_16X9.iso:
>>>>>>>>>> installing
>>>>>>>>>> rdist@thedump: LOCAL ERROR: Response time out
>>>>>>>>>> rdist@thedump: updating of rdist@thedump finished
>>>>>>>>>> $ ps -ax|grep rdist
>>>>>>>>>> 26025 ?? I 0:00.00 tee /var/log/rdist/2011-03-20
>>>>>>>>>> 11059 ?? I 0:00.01 rdist -f /etc/Distfile
>>>>>>>>>> 28446 ?? I 0:22.99 rdist: update rdist@thedump (rdist)
>>>>>>>>>> 7795 ?? I 1:10.32 ssh -l rdist thedump r
>>>>>>>>>> 13045 p0 S+ 0:00.00 grep rdist
>>>>>>>>>>
>>>>>>>> *************************************************************
>>>>>>>> ****************
>>>>>>>> *
>>>>>>>>>> 2.   I know that they happen from time to time. How can I
>>>>>>>> avoid/prevent
>>>>>>>>>> timeouts? The default is 900 sec AKA 15 min? How can this
>> happen
>>>>>>>>>> between two local machines?
>>>>>>>>
>>>>>>>> How big is the file?
>>>>>>>
>>>>>>> So, how big is the file that it times out on?
>>>>>>>
>>>>>>> More than 2Gb? Guess so if a movie file?
>>>>>>>
>>>>>>> I might be barking up the wrong tree, but it will take you two
>>>> seconds
>>>>>> to see if
>>>>>>> there's anything in this > 2Gb idea and if I'm wrong, move on.
>>>>>>>
>>>>>>> Regardless of that, yes, put more debugging on - might give you
>>>> some
>>>>>> more clues.
>>>>>>>
>>>>>>> OpenBSD helps those who help themselves.
>>>>>> Richard,
>>>>>> Thanks for the help.
>>>>>> I had already read the IBM note 'LOCAL ERROR: response time out'
>>>> (from
>>>>>> 2006). (Google is not my enemy?)
>>>>>> I had already checked: the file is >2GB (4.4GB).
>>>>>> I ASSUMED that I can't the only who has tried to push large files
>>>> with
>>>>>> rdist. I searched the OpenBSD list archives (mine go back to 2006)
>>>> and
>>>>>> found nothing significant/useful. Maybe I missed something?
>>>>>> I immediately moved to the misc list per your suggestion.
>>>>>> I did a (manual) run of rdist with "-D" and got similar results --
>> I
>>>> am
>>>>>> still analyzing those messages.
>>>>>> I usually do not compile OpenBSD, so it will take a while to
>> review
>>>> the
>>>>>> rdist source code (client.c?).
>>>>>
>>>>> Thanks ... never assume anything, eh? 8-)
>>>>>
>>>>> If your files are > 2Gb, then that IBM link seems to be spot on,
>> and
>>>> answers
>>>>> (maybe) number 2 on your list - why would you get a timeout on a
>> local
>>>> transfer
>>>>> (if hardware related, you'd expect sftp to fail, or there to be
>> other
>>>> noticeable
>>>>> issues)?
>>>>>
>>>>> I've not used rdist before, but I don't mind having a look now that
>> I
>>>> know your
>>>>> files are > 2Gb. But going to be a quiet (ha!) evening project, so
>> no
>>>> promises
>>>>> (and maybe someone else will blow the theory out of the water and
>>>> provide a
>>>>> different answer/fix.)
>>>>>
>>>>> The IBM note suggests that both client & server need to be amended,
>> IF
>>>> I am on
>>>>> the right track.
>>>>>
>>>>> This is all purely speculative on my part, but it does SEEM to
>> match
>>>> what you
>>>>> are seeing, doesn't it?
>>>>>
>>>>> Thanks.
>>>> [SNIP]
>>>>
>>>> You are right on it! Thanks!
>>>> Not to be greedy, but ...
>>>> What do you think of the other issue that rdist logs a "finished"
>>>> message but does not exit?
>>>>
>>>> Thanks.
>>>>
>>>>
>>> More guessing (I'm already out on a limb ... the branch is about to
>> break) ...
>>> "something" is unhappy because of the time out?
>>>
>>> What messages are in the debug output - do you see "finish() called"
>> as per the
>>> code in common.c below? What's the rest of the message(s)?
>>>
>>> What happens if you move all the > 2Gb files out the way temporarily
>> and re-run
>>> (obviously I don't know how practical this is)? Does it finish
>> normally?
>>>
>>> Or if that doesn't suit, how about creating a test directory with 20
>> (<2 Gb
>>> each) files in, run it, then drop a big file (>2 Gb) in, re-run. If it
>> fails,
>>> then I'd say I was on to something (I don't know anything about rdist,
>> so I do
>>> not know how to set up this test environment.) Remove the big file, or
>> truncate
>>> it down to < 2Gb and re-run. If that works, I get a cookie.
>>>
>>> common.c
>>>
>>> 154 void
>>> 155 finish(void)
>>> 156 {
>>> 157 extern jmp_buf finish_jmpbuf;
>>> 158
>>> 159 debugmsg(DM_CALL,
>>> 160 "finish() called: do_fork = %d amchild = %d isserver = %d",
>>> 161 do_fork, amchild, isserver);
>>> 162 cleanup(0);
>>> 163
>>> 164 /*
>>> 165 * There's no valid finish_jmpbuf for the rdist master parent.
>>> 166 */
>>> 167 if (!do_fork || amchild || isserver) {
>>> 168
>>> 169 if (!setjmp_ok) {
>>> 170 #ifdef DEBUG_SETJMP
>>> 171 error("attemping longjmp() without target");
>>> 172 abort();
>>> 173 #else
>>> 174 exit(1);
>>> 175 #endif
>>> 176 }
>>> 177
>>> 178 longjmp(finish_jmpbuf, 1);
>>> 179 /*NOTREACHED*/
>>> 180 error("Unexpected failure of longjmp() in finish()");
>>> 181 exit(2);
>>> 182 } else
>>> 183 exit(1);
>>> 184 }
>>>
>>> Thanks.
>>>
>>>
>>>
>>
>> I am getting the "finished() called" etc.
>> I now have a theory (your "something" unhappy guess): rdist times out,
>> but the child process does not and is still trying to get the
>> end-of-file. The child is basically in an infinite loop: it does not
>> time out because the dump does respond but it keeps retrieving from the
>> first part of file -- it never reaches past the miscalculated size.
>>
>>  
> 
> My diffs will no doubt get mangled by my webmail and I don't know enough about
> rdist (or the rdist protocol) to know if these are correct.
> 
> Hopefully they are a step in the right direction.
> 
> Basic idea from https://www-304.ibm.com/support/docview.wss?uid=isg1IY85396
> 
> (I was going to look at FreeBSD's version for inspiration but looks like they
> ditched rdist in 2003.)
> 
> Basically strtol to strtoll, %ld to %lld, and (int)/(long) to (off_t) to cope
> with files bigger than > 2Gb.
> 
> Works for me on i386 - without these patches I see the reported behaviour, 
> with
> the patches I see the 4Gb file transferred.
> 
> With patches - it works:
> 
> $ cat rdist.conf                                                         
> HOSTS = (172.16.1.111)
> FILES = (/home/richard.toohey/rdist-test)
> ${FILES} -> ${HOSTS}
> 
> $ rdist -f rdist.conf  
> 172.16.1.111: updating host 172.16.1.111
> richard.toohey@172.16.1.111's password: 
> 172.16.1.111: /home/richard.toohey/rdist-test/zerofile.tst: installing
> 172.16.1.111: updating of 172.16.1.111 finished
> 
> zerofile.tst created with:
> 
> dd if=/dev/zero of=zerofile.tst bs=1k count=4700000
> 
> HTH.
> 
> /usr/src/usr.bin/rdist/client.c
> ===============================
> 
> # diff -uw /home/richard.toohey/obsd-src/usr.bin/rdist/client.c client.c 
> --- /home/richard.toohey/obsd-src/usr.bin/rdist/client.c        Thu Oct 29
> 17:34:06 2009
> +++ client.c    Fri Mar 25 14:54:32 2011
> @@ -399,8 +399,8 @@
>          */
>         ENCODE(ername, rname);
>  
> -       (void) sendcmd(C_RECVREG, "%o %04o %ld %ld %ld %s %s %s", 
> -                      opts, stb->st_mode & 07777, (long) stb->st_size, 
> +       (void) sendcmd(C_RECVREG, "%o %04o %lld %ld %ld %s %s %s", 
> +                      opts, stb->st_mode & 07777, (off_t) stb->st_size, 
>                        stb->st_mtime, stb->st_atime,
>                        user, group, ername);
>         if (response() < 0) {
> @@ -409,8 +409,8 @@
>         }
>  
>  
> -       debugmsg(DM_MISC, "Send file '%s' %ld bytes\n", rname,
> -                (long) stb->st_size);
> +       debugmsg(DM_MISC, "Send file '%s' %lld bytes\n", rname,
> +                (off_t) stb->st_size);
>  
>         /*
>          * Set remote time out alarm handler.
> @@ -666,8 +666,8 @@
>          * Gather and send basic link info
>          */
>         ENCODE(ername, rname);
> -       (void) sendcmd(C_RECVSYMLINK, "%o %04o %ld %ld %ld %s %s %s", 
> -                      opts, stb->st_mode & 07777, (long) stb->st_size, 
> +       (void) sendcmd(C_RECVSYMLINK, "%o %04o %lld %ld %ld %s %s %s", 
> +                      opts, stb->st_mode & 07777, (off_t) stb->st_size, 
>                        stb->st_mtime, stb->st_atime,
>                        user, group, ername);
>         if (response() < 0)
> @@ -682,7 +682,7 @@
>                 error("%s: readlink failed", target);
>                 err();
>         }
> -       (void) snprintf(tbuf, sizeof(tbuf), "%.*s", (int) stb->st_size, lbuf);
> +       (void) snprintf(tbuf, sizeof(tbuf), "%.*s", (off_t) stb->st_size, 
> lbuf);
>         ENCODE(ername, tbuf);
>         (void) sendcmd(C_NONE, "%s\n", ername);
>  
> @@ -869,7 +869,7 @@
>         /*
>          * Parse size
>          */
> -       size = (off_t) strtol(cp, (char **)&cp, 10);
> +       size = (off_t) strtoll(cp, (char **)&cp, 10);
>         if (*cp++ != ' ') {
>                 error("update: size not delimited");
>                 return(US_NOTHING);
> @@ -921,8 +921,8 @@
>  
>         debugmsg(DM_MISC, "update(%s,) local mode %04o remote mode %04o\n", 
>                  rname, lmode, rmode);
> -       debugmsg(DM_MISC, "update(%s,) size %ld mtime %d owner '%s' grp 
> '%s'\n",
> -                rname, (long) size, mtime, owner, group);
> +       debugmsg(DM_MISC, "update(%s,) size %lld mtime %d owner '%s' grp 
> '%s'\n",
> +                rname, (off_t) size, mtime, owner, group);
>  
>         if (statp->st_mtime != mtime) {
>                 if (statp->st_mtime < mtime && IS_ON(opts, DO_YOUNGER)) {
> @@ -935,8 +935,8 @@
>         }
>  
>         if (statp->st_size != size) {
> -               debugmsg(DM_MISC, "size does not match (%ld != %ld).\n",
> -                        (long) statp->st_size, (long) size);
> +               debugmsg(DM_MISC, "size does not match (%lld != %lld).\n",
> +                        (off_t) statp->st_size, (off_t) size);
>                 return(US_OUTDATE);
>         } 
> 
> /usr/src/usr.bin/rdistd/server.c
> ================================
> # diff -uw /home/richard.toohey/obsd-src/usr.bin/rdistd/server.c server.c 
> --- /home/richard.toohey/obsd-src/usr.bin/rdistd/server.c       Thu Oct 29
> 17:34:06 2009
> +++ server.c    Fri Mar 25 14:49:18 2011
> @@ -391,7 +391,7 @@
>  #else
>         /*
>          * We use MT_NOTICE instead of MT_CHANGE because this function is
> -        * sometimes called by other functions that are suppose to return a
> +        * sometimes called by other functions that are supposed to return a
>          * single ack() back to the client (rdist).  This is a kludge until
>          * the Rdist protocol is re-done.  Sigh.
>          */
> @@ -656,8 +656,8 @@
>         case S_IFIFO:
>  #endif
>  #endif
> -               (void) sendcmd(QC_YES, "%ld %ld %o %s %s",
> -                              (long) stb.st_size, stb.st_mtime,
> +               (void) sendcmd(QC_YES, "%lld %ld %o %s %s",
> +                              (off_t) stb.st_size, stb.st_mtime,
>                                stb.st_mode & 07777,
>                                getusername(stb.st_uid, target, options), 
>                                getgroupname(stb.st_gid, target, options));
> @@ -1420,7 +1420,7 @@
>         /*
>          * Get file size
>          */
> -       size = strtol(cp, &cp, 10);
> +       size = strtoll(cp, &cp, 10);
>         if (*cp++ != ' ') {
>                 error("recvit: size not delimited");
>                 return;
> @@ -1523,7 +1523,7 @@
>          */
>         if (min_freespace || min_freefiles) {
>                 /* Convert file size to kilobytes */
> -               long fsize = (long) (size / 1024);
> +               off_t fsize = (off_t) (size / 1024);
>  
>                 if (getfilesysinfo(target, &freespace, &freefiles) != 0)
>                         return;
> 
> Thanks.
> 
> 
> 

Wow!
I had not seen your message and started editing client.c ...
Your changes are about the same as mine, but ...
Why cast size, statp->st_size, etc. to (off_t) when that is their
defined type?  Style?

Is the comparison at line 689 a problem because 'n' is an int?
        if (n != stb->st_size) {

Thanks.

Reply via email to