You could try the patch in http://bugs.bacula.org/view.php?id=2083 (I recently attached a fix for 7.4.4 to this bug).
>>>>> On Sun, 12 Feb 2017 12:09:21 +0100, Kern Sibbald said: > > Hello Dan, > > Well, then we are back to my theory that FreeBSD wants a > particular length in some of the system calls. > > In general, Bacula uses lengths that work for both IPv4 and IPv6, > which means that if you are using IPv4, the packet lengths in some > calls are longer than the really need to be. This may or may not > be true in your case, but is a path to look at. > > Best regards, > > Kern > > On 02/11/2017 11:08 PM, Dan Langille wrote: > > On Feb 11, 2017, at 2:11 AM, Kern Sibbald <k...@sibbald.com> wrote: > > Hello Dan, > > Well, I am happy to know that it is not Bacula related > :-) > > Well, it's not solved yet. This just in. > > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: > Elapsed time=03:20:24, Transfer rate=29.63 M Bytes/second > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: > bsock.c:375 Socket has errors=1 on call to client:Hostname > not found:9103 > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error: > fd_cmds.c:142 Read data not accepted > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: > bsock.c:375 Socket has errors=1 on call to client:Hostname > not found:9103 > > 9103? That must be an SD for which someone > cannot figure out the hostname based on the IP address. > > Would it not be easier to list the IP address > when the hostname cannot be found? That would give a > starting point. > > Let's go to the SD in question: bacula-sd-01-sd > which is at bacula-sd-01.int.unixathome.org (the > Read Storage listed below). From there, > > I try to resolve the write SD (tape01 at tape01.int.unixathome.org), > which is what this message must pertain to. > > $ dig +short tape01.int.unixathome.org > 10.55.0.103 > > $ dig +short -x 10.55.0.103 > tape01.int.unixathome.org. > > That looks OK. Perhaps I misunderstand which > hostname it's trying to provide. > > Is there something else I should verify with > respect to DNS in this instance? > > I looked at the code: The message comes from > line 374 of src/lib/bsock.c > > The 'Hostname not found' comes from line 646 > of src/lib/address_conf.c > > The code invokes getnameinfo(3) and I looked up > that info .http://www.unix.com/man-page/FreeBSD/3/getnameinfo/ > > I'm happy to patch with debugging code for this, > but I'm out of time this weekend. > > The full job output is here: > > 11-Feb 17:50 bacula-dir JobId 256374: Warning: > FileSet MD5 digest not found. > 11-Feb 17:50 bacula-dir JobId 256374: Copying using > JobId=255999 Job=knew_jail_snapshots.2017-02-05_03.04.02_47 > 11-Feb 17:50 bacula-dir JobId 256374: Bootstrap records > written to > /usr/local/bacula/working/bacula-dir.restore.10.bsr > 11-Feb 17:50 bacula-dir JobId 256374: Start Copying JobId > 256374, Job=CopyToTape-Full-LTO4.2017-02-11_17.50.17_22 > 11-Feb 17:50 bacula-dir JobId 256374: Using Device > "vDrive-6" to read. > 11-Feb 17:50 bacula-sd-01-sd JobId 256374: Ready to read > from volume "FullAuto-1377" on file device "vDrive-6" > (/usr/local/bacula/volumes). > > lots of reading skipped..... > > 11-Feb 21:00 bacula-sd-01-sd JobId 256374: Ready to read > from volume "FullAuto-1514" on file device "vDrive-6" > (/usr/local/bacula/volumes). > 11-Feb 21:00 bacula-sd-01-sd JobId 256374: Forward spacing > Volume "FullAuto-1514" to file:block 0:214. > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: > bsock.c:453 Wrote 65540 bytes to client:Hostname not > found:9103, but only 32768 accepted. > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error: > read.c:284 Error sending to File daemon. ERR=Broken pipe > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Elapsed > time=03:20:24, Transfer rate=29.63 M Bytes/second > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: > bsock.c:375 Socket has errors=1 on call to client:Hostname > not found:9103 > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Fatal error: > fd_cmds.c:142 Read data not accepted > 11-Feb 21:11 bacula-sd-01-sd JobId 256374: Error: > bsock.c:375 Socket has errors=1 on call to client:Hostname > not found:9103 > 11-Feb 21:11 bacula-dir JobId 256374: Error: Bacula > bacula-dir 7.4.4 (28Sep16): > Build OS: amd64-portbld-freebsd11.0 freebsd > 11.0-RELEASE-p6 > Prev Backup JobId: 255999 > Prev Backup Job: > knew_jail_snapshots.2017-02-05_03.04.02_47 > New Backup JobId: 256375 > Current JobId: 256374 > Current Job: > CopyToTape-Full-LTO4.2017-02-11_17.50.17_22 > Backup Level: Full > Client: crey-fd > FileSet: "EmptyCopyToTape" 2011-02-20 > 20:53:31 > Read Pool: "FullFile" (From Command input) > Read Storage: "bacula-sd-01-file" (From Pool > resource) > Write Pool: "FullsLTO4" (From Command input) > Write Storage: "tape01" (From Command input) > Catalog: "MyCatalog" (From Client resource) > Start time: 11-Feb-2017 17:50:50 > End time: 11-Feb-2017 21:11:14 > Elapsed time: 3 hours 20 mins 24 secs > Priority: 10 > SD Files Written: 404,305 > SD Bytes Written: 356,347,133,351 (356.3 GB) > Rate: 29636.3 KB/s > Volume name(s): 000013L4 > Volume Session Id: 143 > Volume Session Time: 1486562541 > Last Volume Bytes: 1,187,871,455,232 (1.187 TB) > SD Errors: 3 > SD termination status: Error > Termination: *** Copying Error *** > > -- > > Dan Langille - BSDCan / PGCon > > d...@langille.org > > Kern > > On 02/10/2017 09:46 PM, Dan Langille wrote: > > On Feb 10, 2017, at 12:55 PM, Dan > Langille <d...@langille.org> > wrote: > > On Feb 10, 2017, at 12:20 PM, > Kern Sibbald <k...@sibbald.com> > wrote: > > Hello, > > I suspect that this is a problem with > the FreeBSD networking > implementation. If I remember right > on FreeBSD, when doing name lookups, > if the packet size is not *exactly* > what FreeBSD wants, it fails the > call. On Linux and other machines > (Solaris, Mac), as long as the packet > size is equal or greater than what is > needed the OS call succeeds. If I am > not mistaken, Bacula allocates space > for the larger of IPv4 and IPv6 (which > is always IPv6), and so if you are > using an IPv4 network, Bacula may send > OS calls with a packet size larger > than actually required. > > I just spoke with a FreeBSD > developer. They are unaware of anything > special in the FreeBSD ports tree for > patching FreeBSD when it comes to doing name > lookups. Specifically, gethostby*(), > getipnodeby*() just work... > > If you can reproduce/encounter a > situation which fails, we will look at it > and fix it. In short, I do not think this > is an issue with the FreeBSD networking > implementation. > > I suspect it's a local DNS > misconfiguration on one of my hosts. Which > ones, I don't know yet. Your first post > mentioned the SDs, so I checked them. They > seem OK now. I will verify them again if I > see them again. > > For testing purposes, I will use > dig +short and verify that all of these FQDN > resolve on the FD, the SD, the director, and > from from a fourth host: > > The FQDN of the FD (even though > it is not used in a Copy job) > > The FQDN of the read SD > > The FQDN of the write SD > > The FQDN of the Director > > I will also check that the PTR > record for the A record also resolves back > to the FQDN. > > If this is > the case, I would consider it a > FreeBSD bug. For me to fix it is a > bit complicated, because I need to > know exactly what call is failing and > the values that FreeBSD wants. By the > way, it is possible this is already > fixed in the Enterprise version where > FreeBSD is supported too. If that is > the case, in my next round of > backporting to start next week, it > will get fixed. > > If this was the case, I'd expect apps on > FreeBSD to be failing everywhere. They > aren't. I've never patched anything for DNS > issues either. > > I suspect it's more likely to be > an issue on one of the Bacula nodes in > question (SD or FD) where there is a local DNS > issue. > > For documentation: I found that one of my > three name servers was not correctly resolving PTR > records. That is: > > $ dig +short -x 10.52.0.1 @10.55.0.1 > > Whereas the other two nameservers: > > $ dig +short -x 10.52.0.1 @10.55.0.13 > bast.int.unixathome.org. > > $ dig +short -x > 10.52.0.1 @10.55.0.73 > bast.int.unixathome.org. > > That issue since > been fixed. It took a while to track down. > That nameserver is running pfSense, using > their GUI. The issue is right there on the > page: > > Note: > IN-ADDR.ARPA will be automaticaly included in > config files when reverse zone option is > checked. > > The checkbox is > four controls farther down the page. > > The expected zone > 0.55.10.in-addr.arpa was actually zone > 0.55.10.in-addr.arpa.in-addr.arpa > > I'll keep an eye > on the Bacula logs to see if this shows up > again. > > Thank you. > > -- > > Dan Langille - BSDCan / PGCon > > d...@langille.org > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, SlashDot.org! http://sdm.link/slashdot _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users