Robert Gerber 402-237-8692 r...@craeon.net On Wed, Mar 5, 2025, 8:06 AM Dan Langille <d...@langille.org> wrote:
> On Tue, Mar 4, 2025, at 11:03 PM, Rob Gerber wrote: > > I don't think that the problem is in bacula, for sure. I suspect other > traffic over the link might be similarly impacted. My searching indicated > that 0a000119 is a generic openssl error. Could be many things. I might be > suspicious of the openssl version or implementation installed on your new > router / firewall. The router or firewall may have flawed firmware. > > > Is that theory contingent upon some kind of hardware acceleration on said > firewall? > I don't know if hardware acceleration could be contributing to this. Maybe? > If so, I should be able to verify that that is occurring and perhaps > disable that acceleration so it's all done in software, removing the > firmware from the equation. > I agree, swap to a simpler configuration and see if it has an impact. > > > Consider running wireshark to analyze failed ssl transactions. > > I think I maybe got lucky when I searched this in duckduckgo. The top > result contained something sort of relevant, with further breadcrumbs to > chase. The next million results didn't even contain the 0a000119 keyword > and look unrelated. > Check out this, and follow the links therein, and the links inside those > links. I have about 10 tabs open now and I see some interesting stuff. Some > people turned off segmentation offloading on their nic, others made new > certs, others got rid of their netgear router. 0a000119 is a vague error. > > https://forum.proxmox.com/threads/decryption-failed-or-bad-record-on-remote-sync.145131/ > > > Thank you for that research. It is appreciated. > > Have you verified that data can be sent over the network link? I assume > yes, so what about data larger than a single packet size (ie, if a packet > is fragmented, then what happens?)? > > > The network link of the firewall? Yes. I think that is fine and working > as expected. To test, I ran "wget > https://download.freebsd.org/releases/ISO-IMAGES/14.2/FreeBSD-14.2-RELEASE-amd64-memstick.img > ". > > It completed in about 42s without errors. I verified the checksum is > correct. > > Does that do the test you wanted? This test does not involve the VPN. > I think that is a good test, because it does appear to verify that the network is up and functional on at least one end. I meant to test through the VPN, which you did in subsequent tests. However, verifying the stability and function of your network on EACH end in a manner like you did here is an important test, since it appears to be up, yet doesn't "just work". I would be curious to see if you are able to send traffic directly from host to host without any VPN involved, though I think simply testing the remote end's ability to download a large file successfully could be more important. I would want to check each end for internet connection packet loss by running a continuous ping to some stable internet IP like 8.8.8.8. > > However, your suggestion made me try another test: > > [8:42 pro02 dan ~/tmp] % time scp -r foo.example:~bar/backups/Bacula . > > That grabs all the .bsr files I've backed up to that how. The copy > involves about 2.6M and 221 files. > > Let's try that same backup over the VPN: > > [8:43 pro02 dan ~/tmp] % time scp -r > foo.vpn.example.org:~rsyncer/backups/Bacula > Bacula-vpn > .. about five files are copied > 0% 0 0.0KB/s --:-- ETAssh_dispatch_run_fatal: Connection to > 10.14.0.217 port 22: message authentication code incorrect > scp: Connection closed > scp -r foo.vpn.example.org:~rsyncer/backups/Bacula Bacula-vpn 0.21s user > 0.02s system 25% cpu 0.938 total > > To me, that says something is very wonky with the VPN. > I agree, though I would want to test for packet loss on each end. Maybe file downloads are more resilient than OpenVPN traffic and your OpenVPN session is only serving as a canary here. If you have a basic network connectivity issue, it'd be better to find that before you get elbows deep into openvpn and troubleshooting openssl. I wonder if ping tests would successfully pass through the VPN. If yes, this implies that small data can pass, but not bigger data. That sort of failure mode, if verified, would indicate an issue when a packet is fragmented. If pings won't pass, then I'd wonder if you ever had a working VPN configuration despite the paths appearing to be up. > Which also means, this is not a Bacula issue but a transport issue - solve > that first, and the Bacula issue should resolve. > > Does that make sense? > I agree with your conclusion. This appears to be a transport issue. > > Thank you > > > > > > Regards, > Robert Gerber > 402-237-8692 > r...@craeon.net > > > On Sun, Mar 2, 2025 at 2:17 PM Dan Langille <d...@langille.org> wrote: > > Hello, > > I have several clients which have recently start failing with: > > SD says - Error: openssl.c:108 TLS read/write failure.: > ERR=error:0A000119:SSL routines::decryption failed or bad record mac > FD says - Error: bsock.c:397 Wrote 43011 bytes to Storage daemon: > bacula-sd-04.int.unixathome.org:9103, but only 0 accepted. > SD says - Fatal error: append.c:327 Network error reading from FD. > ERR=Unknown error: 9919 > > My search for these errors isn't finding anything to try. > > Full job output appears below. > > Relevant background: > > * Bacula 15.0.2 installed on clients and servers > * FreeBSD 14.x > * the failing clients have been around for years, with successful backups > * the gateway in my basement was recently replaced, new firewall rules and > OpenVPN configuration > * the OpenVPN topology went from net40 to subnet > * this affects all the VPN hosts; local hosts are unaffected > > Given the Bacula configuration hasn't changed and these jobs had been > running successfully for years, the problem must be with OpenVPN and/or the > firewall rules. However, I cannot find the cause. > > Here is a failed job: > > 02-Mar 19:06 bacula-dir JobId 373736: Start Backup JobId 373736, > Job=r720-02_basic.2025-03-02_19.06.22_22 > 02-Mar 19:06 bacula-dir JobId 373736: Connected to Storage > "bacula-sd-04-FullFile" at bacula-sd-04.int.unixathome.org:9103 with TLS > 02-Mar 19:06 bacula-dir JobId 373736: There are no more Jobs associated > with Volume "FullAuto-04-15375". Marking it purged. > 02-Mar 19:06 bacula-dir JobId 373736: All records pruned from Volume > "FullAuto-04-15375"; marking it "Purged" > 02-Mar 19:06 bacula-dir JobId 373736: Recycled volume "FullAuto-04-15375" > 02-Mar 19:06 bacula-dir JobId 373736: Using Device "vDrive-FullFile-4" to > write. > 02-Mar 19:06 bacula-dir JobId 373736: Connected to Client "r720-02-fd" at > r720-02.vpn.unixathome.org:9102 with TLS > 02-Mar 19:06 r720-02-fd JobId 373736: Connected to Storage at > bacula-sd-04.int.unixathome.org:9103 with TLS > 02-Mar 19:06 bacula-sd-04 JobId 373736: Recycled volume > "FullAuto-04-15375" on File device "vDrive-FullFile-4" > (/usr/local/bacula/volumes/FullFile), all previous data lost. > 02-Mar 19:06 bacula-dir JobId 373736: Max Volume jobs=1 exceeded. Marking > Volume "FullAuto-04-15375" as Used. > 02-Mar 19:06 bacula-sd-04 JobId 373736: Error: openssl.c:108 TLS > read/write failure.: ERR=error:0A000119:SSL routines::decryption failed or > bad record mac > 02-Mar 19:06 r720-02-fd JobId 373736: Error: bsock.c:397 Wrote 43011 bytes > to Storage daemon:bacula-sd-04.int.unixathome.org:9103, but only 0 > accepted. > 02-Mar 19:06 bacula-sd-04 JobId 373736: Fatal error: append.c:327 Network > error reading from FD. ERR=Unknown error: 9919 > 02-Mar 19:06 r720-02-fd JobId 373736: Fatal error: backup.c:1057 Network > send error to SD. ERR=Input/output error > 02-Mar 19:06 bacula-sd-04 JobId 373736: Elapsed time=00:00:01, Transfer > rate=16.71 M Bytes/second > 02-Mar 19:06 r720-02-fd JobId 373736: Error: bsock.c:276 Socket has > errors=1 on call to Storage daemon:bacula-sd-04.int.unixathome.org:9103 > 02-Mar 19:06 bacula-dir JobId 373736: Error: Director's connection to SD > for this Job was lost. > 02-Mar 19:06 bacula-dir JobId 373736: Error: Bacula bacula-dir 15.0.2 > (21Mar24): > Build OS: amd64-portbld-freebsd14.1 freebsd 14.1-RELEASE > JobId: 373736 > Job: r720-02_basic.2025-03-02_19.06.22_22 > Backup Level: Full > Client: "r720-02-fd" 15.0.2 (21Mar24) > amd64-portbld-freebsd14.1,freebsd,14.1-RELEASE > FileSet: "basic backup" 2019-10-28 03:05:00 > Pool: "FullFile-04" (From Job FullPool override) > Catalog: "MyCatalog" (From Client resource) > Storage: "bacula-sd-04-FullFile" (From Pool resource) > Scheduled time: 02-Mar-2025 19:06:19 > Start time: 02-Mar-2025 19:06:25 > End time: 02-Mar-2025 19:06:26 > Elapsed time: 1 sec > Priority: 10 > FD Files Written: 91 > SD Files Written: 0 > FD Bytes Written: 17,352,576 (17.35 MB) > SD Bytes Written: 0 (0 B) > Rate: 17352.6 KB/s > Software Compression: 100.0% 1.0:1 > Comm Line Compression: 45.9% 1.8:1 > Snapshot/VSS: no > Encryption: no > Accurate: no > Volume name(s): FullAuto-04-15375 > Volume Session Id: 56 > Volume Session Time: 1740841409 > Last Volume Bytes: 16,733,601 (16.73 MB) > Non-fatal FD errors: 3 > SD Errors: 0 > FD termination status: Error > SD termination status: Error > Termination: *** Backup Error *** > > > -- > Dan Langille > d...@langille.org > > > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users > > > -- > Dan Langille > d...@langille.org > > >
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users