Hey Bill, thanks for spending time on this!

> On 20. Jul 2022, at 21:46, Bill Arlofski via Bacula-users 
> <bacula-users@lists.sourceforge.net> wrote:
> 
> 
> Justin,
> 
> I know what you told us, but I think we have a situation that I (and Martin) 
> described:

I understand your experiment, but it is not like that here.

> - FD cannot connect to Director due to firewall

It can.

> - Director CAN connect to FD (I know, I know... :)

It cannot.

> - Job starts, Director connects to FD and receives all the queued "Cannot 
> connect" messages
> - Job runs fine
> 
> 
> Here is how I tested:
> 
> - In my FD config I set in the the Director{} block:
> 
>  - ConnectToDirector = yes
>  - A BOGUS IP address for the `Address =` setting
> 
> 
> I killed and restarted the FD in foreground and debug mode, and I see that it 
> goes on to try to connect to an IP address that
> is not taken on my network....
> ----8<----
> speedy-fd: events.c:48-0 Events: code=FD0001 daemon=speedy-fd ref=0x238e 
> type=daemon source=*Daemon* text=Filed startup
> 13.0.0 (04Jul22)
> speedy-fd: filed.c:296-0 filed: listening on port 9102
> speedy-fd: bnet_server.c:90-0 Addresses 0.0.0.0:9102
> speedy-fd: bsockcore.c:354-0 Current 10.1.1.99:9101 All 10.1.1.99:9101
> speedy-fd: bsockcor
> e.c:443-0 Could not connect to server Director daemon 10.1.1.99:9101. ERR=No 
> route to host
> speedy-fd: bsockcore.c:253-0 Unable to connect to Director daemon on 
> 10.1.1.99:9101. ERR=No route to host
> speedy-fd: bsockcore.c:354-0 Current 10.1.1.99:9101 All 10.1.1.99:9101
> speedy-fd: bsockcore.c:443-0 Could not connect to server Director daemon 
> 10.1.1.99:9101. ERR=No route to host
> speedy-fd: bsockcore.c:253-0 Unable to connect to Director daemon on 
> 10.1.1.99:9101. ERR=No route to host
> ----8<----
> 
> Meanwhile, from the Director, I do a `status client=xxxx` and BAMM.. Director 
> connects to Client and I get the FD's status -
> so a Job would also work in this manner.
> 
> 
> From your Director, can you try:

good thinking. This was the first thing I checked when I saw the errors, 
though. I usually try everything i can think of before I turn to the mailing 
list, but of course you cannot know what I tried, as I did not mention it.

> # telnet <IP of Client> 9102

Connection refused

> And from the Client:
> 
> # telnet <IP of Director> 9101

no telnet there, using netcat instead, the connection gets established. I can 
write stuff, after some “invalid keywords” the connection is closed by the 
director.

to be sure I tried again with other port numbers that have no daemons running. 
netcat returns immediately (due to port being closed).

> And show us the results?

see above.

In the mean while the schedule ran again.

today: no connection error messages.

OK OK, but why. What was different? I did some experiments earlier, so the job 
did run twice before.

Also I ran another longer running job on the other tier, but actually the 
problematic job did not queue up but ran through immediately (so both jobs ran 
simultaneously) and no errors were thrown.

When a few minutes ago the schedule started the longer running job also was 
started as “incremental” and had no files to be backed up (because it ran a few 
hours before and no changes had been made in the fileset.

Finally, I had commented out the Reconnect clause.

Hard to say what was the reason. 

I will observe whether tomorrow it will or will not throw connection  errors 
and will report back in both cases. And I will not make any experiments on 
Bacula before the schedule runs tomorrow.

> 
> Thank you!
> Bill
> 
> --
> Bill Arlofski
> w...@protonmail.com
> _______________________________________________
> Bacula-users mailing list
> Bacula-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bacula-users

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to