The last two scheduled runs of the job again had the connection errors (and 
again the the backup was still taken fine). yesterday I even ran the longer 
running job a few hours ahead to see if this was the reason why the connection 
error disappeared the other night - but that was not it. Also commenting out 
the reconnect clause didn’t make a difference.

the only two things I am aware I can do now to check it out further:

(1) use a connection schedule for the FD
(2) downgrade the FD from 13 to 11 (can this really be the cause?)

> On 20. Jul 2022, at 22:35, Justin Case <jus7inc...@gmail.com> wrote:
> 
> 
> Hey Bill, thanks for spending time on this!
> 
>> On 20. Jul 2022, at 21:46, Bill Arlofski via Bacula-users 
>> <bacula-users@lists.sourceforge.net 
>> <mailto:bacula-users@lists.sourceforge.net>> wrote:
>> 
>> 
>> Justin,
>> 
>> I know what you told us, but I think we have a situation that I (and Martin) 
>> described:
> 
> I understand your experiment, but it is not like that here.
> 
>> - FD cannot connect to Director due to firewall
> 
> It can.
> 
>> - Director CAN connect to FD (I know, I know... :)
> 
> It cannot.
> 
>> - Job starts, Director connects to FD and receives all the queued "Cannot 
>> connect" messages
>> - Job runs fine
>> 
>> 
>> Here is how I tested:
>> 
>> - In my FD config I set in the the Director{} block:
>> 
>>  - ConnectToDirector = yes
>>  - A BOGUS IP address for the `Address =` setting
>> 
>> 
>> I killed and restarted the FD in foreground and debug mode, and I see that 
>> it goes on to try to connect to an IP address that
>> is not taken on my network....
>> ----8<----
>> speedy-fd: events.c:48-0 Events: code=FD0001 daemon=speedy-fd ref=0x238e 
>> type=daemon source=*Daemon* text=Filed startup
>> 13.0.0 (04Jul22)
>> speedy-fd: filed.c:296-0 filed: listening on port 9102
>> speedy-fd: bnet_server.c:90-0 Addresses 0.0.0.0:9102
>> speedy-fd: bsockcore.c:354-0 Current 10.1.1.99:9101 All 10.1.1.99:9101
>> speedy-fd: bsockcor
>> e.c:443-0 Could not connect to server Director daemon 10.1.1.99:9101. ERR=No 
>> route to host
>> speedy-fd: bsockcore.c:253-0 Unable to connect to Director daemon on 
>> 10.1.1.99:9101. ERR=No route to host
>> speedy-fd: bsockcore.c:354-0 Current 10.1.1.99:9101 All 10.1.1.99:9101
>> speedy-fd: bsockcore.c:443-0 Could not connect to server Director daemon 
>> 10.1.1.99:9101. ERR=No route to host
>> speedy-fd: bsockcore.c:253-0 Unable to connect to Director daemon on 
>> 10.1.1.99:9101. ERR=No route to host
>> ----8<----
>> 
>> Meanwhile, from the Director, I do a `status client=xxxx` and BAMM.. 
>> Director connects to Client and I get the FD's status -
>> so a Job would also work in this manner.
>> 
>> 
>> From your Director, can you try:
> 
> good thinking. This was the first thing I checked when I saw the errors, 
> though. I usually try everything i can think of before I turn to the mailing 
> list, but of course you cannot know what I tried, as I did not mention it.
> 
>> # telnet <IP of Client> 9102
> 
> Connection refused
> 
>> And from the Client:
>> 
>> # telnet <IP of Director> 9101
> 
> no telnet there, using netcat instead, the connection gets established. I can 
> write stuff, after some “invalid keywords” the connection is closed by the 
> director.
> 
> to be sure I tried again with other port numbers that have no daemons 
> running. netcat returns immediately (due to port being closed).
> 
>> And show us the results?
> 
> see above.
> 
> In the mean while the schedule ran again.
> 
> today: no connection error messages.
> 
> OK OK, but why. What was different? I did some experiments earlier, so the 
> job did run twice before.
> 
> Also I ran another longer running job on the other tier, but actually the 
> problematic job did not queue up but ran through immediately (so both jobs 
> ran simultaneously) and no errors were thrown.
> 
> When a few minutes ago the schedule started the longer running job also was 
> started as “incremental” and had no files to be backed up (because it ran a 
> few hours before and no changes had been made in the fileset.
> 
> Finally, I had commented out the Reconnect clause.
> 
> Hard to say what was the reason. 
> 
> I will observe whether tomorrow it will or will not throw connection  errors 
> and will report back in both cases. And I will not make any experiments on 
> Bacula before the schedule runs tomorrow.
> 
>> 
>> Thank you!
>> Bill
>> 
>> --
>> Bill Arlofski
>> w...@protonmail.com <mailto:w...@protonmail.com>
>> _______________________________________________
>> Bacula-users mailing list
>> Bacula-users@lists.sourceforge.net 
>> <mailto:Bacula-users@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/bacula-users 
>> <https://lists.sourceforge.net/lists/listinfo/bacula-users>
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to