On 8/3/2015 6:52 AM, Raimund Sacherer wrote:
> Hello,
>
> we use bacula now for about 6 years and it works great. Some 6 month ago we 
> switched to another Bacula Server. The switch included changing the OS from 
> Linux to FreeBSD (to get better flexibility with ZFS, etc.).
>
> Since the move we experience some problems. I keep logs for about 3 month so 
> right now I can say that for about 3100 Backup Jobs 11 Jobs fail with:
>
> 02-Aug 09:13 backupserver-dir JobId 110334: Fatal error: Network error with 
> FD during Backup: ERR=Connection reset by peer
> 02-Aug 09:13 backupserver-dir JobId 110334: Error: Bacula backupserver-dir 
> 5.2.12 (12Sep12):
>
> But it seems the backup has finished just fine. After changing the the volume 
> status from Error to Used we can restore files.
>
> It seems that after the backup is finished, a communication attempt between 
> the director and the client fails somehow.
>
> All those clients are in the same LAN Network. The backup time is comparable, 
> also the amount of files etc.
>
> It *seems* to only affect Windows, but I can not verify this fact as I do not 
> have logs beyond 3 month.
>
> I read some-where that there could be problems with some sort of timeout in 
> the FreeBSD network stack, but before twiddling with some knobs I really 
> would appreciate if someone else had similar problems in the past and knows 
> what the root cause is.
>

I have seen this before as well, although not with FreeBSD. Bacula-dir 
expects the TCP connection with the client to remain up throughout the 
entire job. In my case I concluded that it was aggressive Windows power 
management shutting down the Ethernet interface PHY. I continue to have 
problems with Mac OSX clients power management shutting down the 
wireless PHY, but have not had time to investigate. With Windows 7 it is 
possible to disable the "Allow the computer to turn off this device to 
save power" setting in the Power Management tab of the network adapter's 
Properties. It depends on the NIC driver as to whether or not this is 
needed. Some drivers report that they handle various sleep states when 
they in fact do not, or at least they do not return to D0 state in a 
timely manner.

Another possibility for Windows 7  is Energy Efficient Ethernet. That 
can be disabled too if for example the NIC supports EEE but a switch in 
between (or the NIC driver in FreeBSD) does not, or if somewhere in 
between client and Dir the EEE implementations do not agree on the same 
"standard".

And finally, many switches also have TCP timeout settings and/or EEE and 
power management that could potentially not work correctly with either 
the FreeBSD or the Windows network stacks.

In any case, it is almost certainly a network issue, rather than a 
general Bacula issue. Because Bacula leaves TCP connections open for 
extended periods it is really good at discovering network issues.


------------------------------------------------------------------------------
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to