This is due to the algorithm used by Bacula to do connect timeouts.  It
isn't really a timeout, it is really a retry count.  If you take the connect
timeout in seconds and divide it by 10 you get the number of retries.  It
doesn't account for the time spent in the connect call.  If the connect took
zero amount of time to fail, the two would be the same thing.  To make
matters worse, the connect call takes a different amount of time to fail
depending on whether or not a switch is involved.

So in your case, 5 minutes is equal to 300 seconds, divided by 10 equals 30.
So you will get 30 retries.  

Now, on the same subnet, it takes 6 minutes and 36 seconds to do 30 retries.
So it takes 1 minute and 36 seconds for 30 calls to connect to fail or
roughly 3 seconds per try.

On different subnets, it takes 1 hour 39 minutes and 31 seconds or 189
seconds or roughly 3 minutes per try.

The reason for the differences is probably caching on the switch.  I suspect
that in the same subnet case the arp is failing (so the IP address can't be
converted to an Ethernet address), in the other case the switch is
responding to the arp and a higher level (and longer timeout) is coming into
play, probably the TCP connect timer.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marc
Brückner
Sent: Thursday, October 26, 2006 5:45 AM
To: bacula-users@lists.sourceforge.net
Cc: Knischka; Holger Luedecke
Subject: [Bacula-users] Different timeouts in different subnets

Hi @ all Bacula users,

I am using Bacula for several year now and I am really satisfied with it.
But now I have a strange Problem. I am not sure but I think it first 
occurred since I updated from
version 1.36 to 1.38 . Now I am running 1.38.11
My Bacula has to backup several WinXP clients over night.
When the client runs, there is no problem and the backup is done properly.
But if the users switch off their clients ( what happen often, 
unfortunately ) the duration of the timeout depends on the IP-Subnet the 
client is in .
I have the following Timeout settings in the bacula-dir.conf

 FD Connect Timeout = 5 minutes
 SD Connect Timeout = 5 minutes

If the Client is in the same IP-Subnet as the Bacula-director, the 
director tells:

24-Oct 08:43 Bacula-dir: Start Backup JobId 3040,
Job=StudentA2190_A.2006-10-23_19.40.53
24-Oct 08:44 Bacula-dir: StudentA2190_A.2006-10-23_19.40.53 Warning:
bnet.c:853 Could not connect to File daemon on 192.168.10.67:9102. ERR=No
route to host
Retrying ...
24-Oct 08:50 Bacula-dir: StudentA2190_A.2006-10-23_19.40.53 Fatal error:
bnet.c:859 Unable to connect to File daemon on 192.168.10.67:9102. ERR=No
route to host
24-Oct 08:50 Bacula-dir: StudentA2190_A.2006-10-23_19.40.53 Error: Bacula
1.38.11 (28Jun06): 24-Oct-2006 08:50:29
 
 ...

  Scheduled time:         23-Oct-2006 19:40:52
  Start time:             24-Oct-2006 08:43:53
  End time:               24-Oct-2006 08:50:29
  Elapsed time:           6 mins 36 secs
  

Timeout after 6 and a half minutes,ERR=No route to host ; thats OK.
But if the Client resides in a different IP-Subnet is says:

25-Oct 02:39 Bacula-dir: Start Backup JobId 3070,
Job=StudentA1080_A.2006-10-24_18.00.11
25-Oct 02:45 Bacula-dir: StudentA1080_A.2006-10-24_18.00.11 Warning:
bnet.c:853 Could not connect to File daemon on 192.168.30.33:9102.
ERR=Connection timed out
Retrying ...
25-Oct 04:18 Bacula-dir: StudentA1080_A.2006-10-24_18.00.11 Fatal error:
bnet.c:859 Unable to connect to File daemon on 192.168.30.33:9102.
ERR=Connection timed out
25-Oct 04:18 Bacula-dir: StudentA1080_A.2006-10-24_18.00.11 Error: Bacula
1.38.11 (28Jun06): 25-Oct-2006 04:18:47

...

  Scheduled time:         24-Oct-2006 18:00:10
  Start time:             25-Oct-2006 02:39:16
  End time:               25-Oct-2006 04:18:47
  Elapsed time:           1 hour 39 mins 31 secs

Timeout after 1 hour and 40 minutes, ERR=Connection timed out; thats a 
little bit long.

I have observed many log entries and its always the same: same subnet => 
6 m different subnet =>1:40 h.
There is no packet filtering between the subnets.

Has anyone experienced an behavior like that? Has anyone a hint for me 
how to shorten this 1:40 h timeout.

Thank you for your help.

Marc

-- 

Marc Brückner

Institute of
Shipping Economics and Logistics       Mail to: [EMAIL PROTECTED]
Universitaetsallee GW 1 Block A        Phone:   +49 421 22096-67
28359 Bremen                           Fax:     +49 421 22096-55


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users



-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to