Re: [Bacula-users] Bacula Waits for 40min When a Client is Down

Kyle Marsh Thu, 07 Jun 2007 14:05:33 -0700

On 6/7/07, Arno Lehmann <[EMAIL PROTECTED]> wrote:
> Hi,
>
> On 6/7/2007 7:28 PM, Kyle Marsh wrote:
> > Sorry if this double posts -- I used the wrong e-mail and the first
> > copy is at the mercy of the moderator.  You can kill that one, btw.
> >
> > On 6/7/07, Arno Lehmann <[EMAIL PROTECTED]> wrote:
> >> Hi,
> >>
> >> On 6/7/2007 2:42 PM, Steve Campbell wrote:
> >>> I had the same thought about time-outs just yesterday. I am new to
> >>> Bacula, and was testing my second client backup, when I realized I had
> >>> not defined the client in my host file. I run my  backups on an internal
> >>> network (non-public) so the DNS for this was not available either. I was
> >>> using bconsole, and the job hung up due to not being able to find the
> >>> client,
> >> It should terminate almost immediately.
> >>
> >
> > And yet the fact remains that for both Steve and me, it doesn't.  You
> > suggested pinging the client with RunBeforeJob.  Is there a better way
> > to do this than adding a new line in each job?  You cannot put it in
> > the JobDefs because you need the hostname, and of course you cannot
> > extract that from anything in the Job field as far as I know, so it
> > has to appear magically.
>
> Hmm... right. Using a python event would perhaps work, but I haven't
> investigated this.

That's an interesting suggestion -- I could have a Python script that
gets called and parses the config file to determine the full name from
the client name.  That gets rid of the magic, at least (or replaces it
with worse magic, depending on your perspective).

>
> For my setups, the clients affected by possible non-availability are
> only a minority, so I just added the line to the jobs as needed.
>

Unfortunately I'm not sure how the boxes are to be set up -- most are
machines for students doing research with their professors and they
will change hands and configuration every semester.  I don't know if
users will decide to shut them down overnight or what, so I was hoping
to find a blanket that could cover them all.

> >
> >>> and I was wondering how to kill it immediately, as it wasn't
> >>> going to find the client.
> >> The cancel command...
> >>
> >
> > Of course, I don't believe the cancel command is particularly
> > effective when your console is busy with something else, like waiting
> > 40 minutes for a response that isn't coming.
>
> Ah, I misunderstood the problem... I assumed it was the job that was
> stalled, but it's the console itself.
>

Actually, I think I may have been thinking about the wrong problem --
If I try the status command the console hangs (it's not unresponsive
and will accept ctrl-C, it just doesn't give a prompt) but in this
case it recovers in about 30 seconds -- at least it did in the test I
just performed.   I had a problem like this with firewalling earlier,
so I was mixing the symptoms  If I try to manually run the job with a
downed client, it gives the following message:

07-Jun 13:53 mydirector-dir:
Windows_Remote_Client_Test.2007-06-07_13.53.12 Warning: bnet.c:864
Could not connect to File daemon on windows.mynetwork.edu:9102. ERR=No
route to host
Retrying ...

I don't know yet how long this lasts, but my assumption is 40 minutes
just like when it is automatically run.

> This is definitely another problem, then. In my experience, the console
> returns always immediately after a run command, except when the catalog
> database is currently locked (which should only happen while the catalog
> backup is running).
>
> >  You need to start a new
> > bconsole and run cancel from there as far as I can tell.
>
> Yes, when the consle is stuck, you're right.
>
> >>> As a new user, I didn't know if there is a
> >>> timeout value I could set, or how long it was going to run (maybe even
> >>> forever?).
>
> For this particular issue, there is nothing you can configure as far as
> I can tell.
>
> If this problem persists and is not related to the catalog database
> backend, I'd suggesting either running tcpdump or wireshark to observe
> what the DIR and console exchange, or run the DIR with debug output.
> That might tell us where the wait time comes from.
>
> Arno
>

Thanks for the help, Arno.  Is there any chance of you taking a look
at my other post about the pool configuration?  That's really the more
pressing now.

~Kyle

> >>> So, good question, Arno, and I hope someone provides the answer.
> >> Erm, which was the question?
> >>
> >> :-)
> >>
> >> Arno
> >>
> >>> Steve
> >>>
> >>> Arno Lehmann wrote:
> >>>> Hi,
> >>>>
> >>>> On 6/6/2007 10:57 PM, Kyle Marsh wrote:
> >>>>
> >>>>> Howdy,
> >>>>>
> >>>>> I'm working on a bacula setup for my college and I have found that
> >>>>> when a client goes down, whether it's firewalled, turned off, or
> >>>>> otherwise disconnected from the network, bacula seems to hang for
> >>>>> about 40 minutes
> >>>>>
> >>>> Unusual timeout, in my experience... I'd expect nearly instantaneous job
> >>>> failure or the IT-related two hours...
> >>>>
> >>>>
> >>>>> before deciding that the client isn't there and
> >>>>> stopping.  This could become problematic if we have several machines
> >>>>> down each night and could cause substantial problems if some backups
> >>>>> don't start until people are back working.  Is there a directive that
> >>>>> allows me to specify something sane as the timeout period, and where
> >>>>> does it need to go?
> >>>>>
> >>>> I prefer leaving the timeouts to Bacula, and instead use "Run Before
> >>>> Job" scripts to ping the clients. Concurrent jobs are a reasonable
> >>>> solution against long-running or stalled jobs.
> >>>>
> >>>> Arno
> >>>>
> >>>>
> >>>>> Thanks,
> >>>>>
> >>>>> Kyle Marsh

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Bacula Waits for 40min When a Client is Down

Reply via email to