You can get "Connection reset by peer" on Linux even within a single process as follows:
0. Suppose A and B are two file descriptors connected by a socket. 1. Write a byte to A. 2. Close B. 3. Read a byte from A. This fails with "Connection reset by peer" rather than getting EOF. You do get EOF if you omit step 1. There might be other related situations. I'm not 100% sure how this happens in Bacula, but it might be something like: 0. The reading SD and writing SD threads are connected by a socket. 1. The writing SD wants a new tape, so it calls wait_for_sysop. 1a. This calls jcr->file_bsock->signal(BNET_HEARTBEAT), which writes a byte to the socket. 1b. The new tape is loaded and wait_for_sysop returns. 1c. The writing SD continues its loop in do_append_data, reading from the socket. 2. The reading SD reaches the end of the old data and closes the socket. 3. The writing SD is still reading in do_append_data and so gets "Connection reset by peer". As you say, waiting for a new Volume is not unusual, but the difference is that I think nothing wil read the BNET_HEARTBEAT in a Copy job. For a Backup job, the FD (acting as the reading SD above) doesn't close the socket until it has synchronized with the writing SD. I have no way to reproduce this, but the theory could be tested if Andras can run a patched SD with wait_for_sysop changed to skip the heartbeat when running a Copy job. __Martin >>>>> On Fri, 10 May 2019 14:55:53 +0200, Kern Sibbald said: > > Hello Martin, > > This is an interesting reflection. Do you think it is a timeout, or an > out and out bug where Bacula gets confused with additional > communications? A bug would be a bit hard to understand, because the > SD often waits for a Volume to be mounted -- of course, there can > certainly be bug for a copy job where two separate Volumes are involved. > > Do you have any way to easily reproduce this? > > Best regards, > Kern > > On 5/10/19 1:25 PM, Martin Simmons wrote: > > I'm pretty sure the "Connection reset by peer" error is a Bacula bug, > > triggered when a Copy job waits in the middle for a new tape to write. > > > > This is causing the Copying Error status. > > > > __Martin > > > > > > > >>>>>> On Fri, 10 May 2019 00:19:54 +0200, Andras Horvai said: > >> hi, > >> > >> anybody, any idea regarding this error? Why the termination status of the > >> previous job was: *** Copying Error *** ? > >> > >> Thanks, > >> > >> Andras > >> > >> On Wed, May 8, 2019 at 4:11 PM Andras Horvai <andras.hor...@gmail.com> > >> wrote: > >> > >>> you got the point, here it is another error message: > >>> > >>> 06-May 12:01 backup2-dir JobId 1038: Using Device "LTO-6" to write. > >>> 06-May 12:07 backup2-sd JobId 1038: [SI0202] End of Volume "WORMW-1181" at > >>> 1424:28456 on device "LTO-6" (/dev/nst0). Write of 64512 bytes got -1. > >>> 06-May 12:07 backup2-sd JobId 1038: Re-read of last block succeeded. > >>> 06-May 12:07 backup2-sd JobId 1038: End of medium on Volume "WORMW-1181" > >>> Bytes=2,820,363,420,672 Blocks=43,718,430 at 06-May-2019 12:07. > >>> 06-May 12:08 backup2-dir JobId 1038: Created new Volume="WORMW-1182", > >>> Pool="TapeArchive", MediaType="LTO-6" in catalog. > >>> 06-May 12:08 backup2-sd JobId 1038: Please mount append Volume > >>> "WORMW-1182" or label a new one for: > >>> Job: srv1-job.2019-05-06_09.00.01_21 > >>> Storage: "LTO-6" (/dev/nst0) > >>> Pool: TapeArchive > >>> Media type: LTO-6 > >>> 06-May 12:46 backup2-sd JobId 1038: Error: [SE0203] The Volume=WORMW-1182 > >>> on device="LTO-6" (/dev/nst0) appears to be unlabeled. > >>> 06-May 12:47 backup2-sd JobId 1038: Labeled new Volume "WORMW-1182" on > >>> Tape device "LTO-6" (/dev/nst0). > >>> 06-May 12:47 backup2-sd JobId 1038: Wrote label to prelabeled Volume > >>> "WORMW-1182" on Tape device "LTO-6" (/dev/nst0) > >>> 06-May 12:47 backup2-sd JobId 1038: New volume "WORMW-1182" mounted on > >>> device "LTO-6" (/dev/nst0) at 06-May-2019 12:47. > >>> 06-May 12:56 backup2-sd JobId 1038: Fatal error: append.c:170 Error > >>> reading data header from FD. n=-2 msglen=0 ERR=Connection reset by peer > >>> 06-May 12:56 backup2-sd JobId 1038: Elapsed time=00:14:48, Transfer > >>> rate=68.06 M Bytes/second > >>> 06-May 12:56 backup2-sd JobId 1038: Sending spooled attrs to the Director. > >>> Despooling 27,981,780 bytes ... > >>> > >>> so why I got Connection reset by peer message? SD,FD,Director is on the > >>> same machine (in the case of Copy jobs) > >>> > >>> Thanks, > >>> Andras > >>> > >>> On Wed, May 8, 2019 at 3:10 PM Martin Simmons <mar...@lispworks.com> > >>> wrote: > >>> > >>>> That look clean. > >>>> > >>>> Are there any messages for the "New Backup JobId" (1038)? I find them > >>>> printed > >>>> after the "Termination:" line for the copy job. > >>>> > >>>> __Martin > >>>> > >>>> > >>>>>>>>> On Wed, 8 May 2019 14:32:31 +0200, Andras Horvai said: >>>>> hi, > >>>>> >>>>> here is the snipped part: :) > >>>>> >>>>> 06-May 09:00 backup2-dir JobId 1037: Copying using JobId=1016 >>>>> Job=srv1-job.2019-05-04_02.00.00_59 >>>>> 06-May 12:01 backup2-dir JobId 1037: Start Copying JobId 1037, >>>>> Job=ArchiveJob.2019-05-06_09.00.01_20 >>>>> 06-May 12:01 backup2-dir JobId 1037: Using Device "FileStorage" to read. >>>>> 06-May 12:01 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1006" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:01 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1006" > >>>> to >>>>> addr=531212699 >>>>> 06-May 12:01 backup2-sd JobId 1037: End of Volume "FILEW-1006" at >>>>> addr=2147431799 on device "FileStorage" (/backup). >>>>> 06-May 12:01 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1007" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:01 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1007" > >>>> to >>>>> addr=238 >>>>> 06-May 12:02 backup2-sd JobId 1037: End of Volume "FILEW-1007" at >>>>> addr=2147475513 on device "FileStorage" (/backup). >>>>> 06-May 12:02 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1008" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:02 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1008" > >>>> to >>>>> addr=238 >>>>> 06-May 12:02 backup2-sd JobId 1037: End of Volume "FILEW-1008" at >>>>> addr=2147475637 on device "FileStorage" (/backup). >>>>> 06-May 12:02 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1009" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:02 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1009" > >>>> to >>>>> addr=238 >>>>> 06-May 12:03 backup2-sd JobId 1037: End of Volume "FILEW-1009" at >>>>> addr=2147475644 on device "FileStorage" (/backup). >>>>> 06-May 12:03 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1010" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:03 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1010" > >>>> to >>>>> addr=238 >>>>> 06-May 12:03 backup2-sd JobId 1037: End of Volume "FILEW-1010" at >>>>> addr=2147475667 on device "FileStorage" (/backup). >>>>> 06-May 12:03 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1011" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:03 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1011" > >>>> to >>>>> addr=238 >>>>> 06-May 12:04 backup2-sd JobId 1037: End of Volume "FILEW-1011" at >>>>> addr=2147475655 on device "FileStorage" (/backup). >>>>> 06-May 12:04 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1012" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:04 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1012" > >>>> to >>>>> addr=238 >>>>> 06-May 12:05 backup2-sd JobId 1037: End of Volume "FILEW-1012" at >>>>> addr=2147475652 on device "FileStorage" (/backup). >>>>> 06-May 12:05 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1013" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:05 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1013" > >>>> to >>>>> addr=238 >>>>> 06-May 12:05 backup2-sd JobId 1037: End of Volume "FILEW-1013" at >>>>> addr=2147475669 on device "FileStorage" (/backup). >>>>> 06-May 12:05 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1014" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:05 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1014" > >>>> to >>>>> addr=238 >>>>> 06-May 12:05 backup2-sd JobId 1037: End of Volume "FILEW-1014" at >>>>> addr=2147475654 on device "FileStorage" (/backup). >>>>> 06-May 12:05 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1015" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:05 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1015" > >>>> to >>>>> addr=238 >>>>> 06-May 12:06 backup2-sd JobId 1037: End of Volume "FILEW-1015" at >>>>> addr=2147475661 on device "FileStorage" (/backup). >>>>> 06-May 12:06 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1016" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:06 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1016" > >>>> to >>>>> addr=238 >>>>> 06-May 12:06 backup2-sd JobId 1037: End of Volume "FILEW-1016" at >>>>> addr=2147475671 on device "FileStorage" (/backup). >>>>> 06-May 12:06 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1017" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:06 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1017" > >>>> to >>>>> addr=238 >>>>> 06-May 12:07 backup2-sd JobId 1037: End of Volume "FILEW-1017" at >>>>> addr=2147475664 on device "FileStorage" (/backup). >>>>> 06-May 12:07 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1018" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:07 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1018" > >>>> to >>>>> addr=238 >>>>> 06-May 12:47 backup2-sd JobId 1037: End of Volume "FILEW-1018" at >>>>> addr=2147475651 on device "FileStorage" (/backup). >>>>> 06-May 12:47 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1019" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:47 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1019" > >>>> to >>>>> addr=238 >>>>> 06-May 12:48 backup2-sd JobId 1037: End of Volume "FILEW-1019" at >>>>> addr=2147475672 on device "FileStorage" (/backup). >>>>> 06-May 12:48 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1020" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:48 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1020" > >>>> to >>>>> addr=238 >>>>> 06-May 12:49 backup2-sd JobId 1037: End of Volume "FILEW-1020" at >>>>> addr=2147475677 on device "FileStorage" (/backup). >>>>> 06-May 12:49 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1021" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:49 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1021" > >>>> to >>>>> addr=238 >>>>> 06-May 12:49 backup2-sd JobId 1037: End of Volume "FILEW-1021" at >>>>> addr=2147475654 on device "FileStorage" (/backup). >>>>> 06-May 12:49 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1022" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:49 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1022" > >>>> to >>>>> addr=238 >>>>> 06-May 12:50 backup2-sd JobId 1037: End of Volume "FILEW-1022" at >>>>> addr=2147475671 on device "FileStorage" (/backup). >>>>> 06-May 12:50 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1023" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:50 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1023" > >>>> to >>>>> addr=238 >>>>> 06-May 12:50 backup2-sd JobId 1037: End of Volume "FILEW-1023" at >>>>> addr=2147475674 on device "FileStorage" (/backup). >>>>> 06-May 12:50 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1024" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:50 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1024" > >>>> to >>>>> addr=238 >>>>> 06-May 12:51 backup2-sd JobId 1037: End of Volume "FILEW-1024" at >>>>> addr=2147475677 on device "FileStorage" (/backup). >>>>> 06-May 12:51 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1025" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:51 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1025" > >>>> to >>>>> addr=238 >>>>> 06-May 12:51 backup2-sd JobId 1037: End of Volume "FILEW-1025" at >>>>> addr=2147475654 on device "FileStorage" (/backup). >>>>> 06-May 12:51 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1026" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:51 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1026" > >>>> to >>>>> addr=238 >>>>> 06-May 12:52 backup2-sd JobId 1037: End of Volume "FILEW-1026" at >>>>> addr=2147475694 on device "FileStorage" (/backup). >>>>> 06-May 12:52 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1027" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:52 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1027" > >>>> to >>>>> addr=238 >>>>> 06-May 12:53 backup2-sd JobId 1037: End of Volume "FILEW-1027" at >>>>> addr=2147475665 on device "FileStorage" (/backup). >>>>> 06-May 12:53 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1028" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:53 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1028" > >>>> to >>>>> addr=238 >>>>> 06-May 12:53 backup2-sd JobId 1037: End of Volume "FILEW-1028" at >>>>> addr=2147475650 on device "FileStorage" (/backup). >>>>> 06-May 12:53 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1029" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:53 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1029" > >>>> to >>>>> addr=238 >>>>> 06-May 12:54 backup2-sd JobId 1037: End of Volume "FILEW-1029" at >>>>> addr=2147475649 on device "FileStorage" (/backup). >>>>> 06-May 12:54 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1030" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:54 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1030" > >>>> to >>>>> addr=238 >>>>> 06-May 12:54 backup2-sd JobId 1037: End of Volume "FILEW-1030" at >>>>> addr=2147475651 on device "FileStorage" (/backup). >>>>> 06-May 12:54 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1031" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:54 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1031" > >>>> to >>>>> addr=238 >>>>> 06-May 12:55 backup2-sd JobId 1037: End of Volume "FILEW-1031" at >>>>> addr=2147475636 on device "FileStorage" (/backup). >>>>> 06-May 12:55 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1032" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:55 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1032" > >>>> to >>>>> addr=238 >>>>> 06-May 12:55 backup2-sd JobId 1037: End of Volume "FILEW-1032" at >>>>> addr=2147475667 on device "FileStorage" (/backup). >>>>> 06-May 12:55 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1033" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:55 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1033" > >>>> to >>>>> addr=238 >>>>> 06-May 12:55 backup2-sd JobId 1037: End of Volume "FILEW-1033" at >>>>> addr=2147475633 on device "FileStorage" (/backup). >>>>> 06-May 12:55 backup2-sd JobId 1037: Ready to read from volume > >>>> "FILEW-1034" >>>>> on File device "FileStorage" (/backup). >>>>> 06-May 12:55 backup2-sd JobId 1037: Forward spacing Volume "FILEW-1034" > >>>> to >>>>> addr=238 >>>>> 06-May 12:56 backup2-sd JobId 1037: End of Volume "FILEW-1034" at >>>>> addr=895179674 on device "FileStorage" (/backup). >>>>> 06-May 12:56 backup2-sd JobId 1037: Elapsed time=00:54:40, Transfer >>>>> rate=18.42 M Bytes/second >>>>> 06-May 12:56 backup2-dir JobId 1037: Error: Bacula backup2-dir 9.4.1 >>>>> (20Dec18): > >>>>> > >>>>> > >>>>> > >>>>> >>>>> On Tue, May 7, 2019 at 6:51 PM Martin Simmons <mar...@lispworks.com> > >>>> wrote: > >>>>>> Maybe there is an error message in the text you snipped? > >>>>>> > >>>>>> __Martin > >>>>>> > >>>>>> > >>>>>>>>>>> On Mon, 6 May 2019 23:09:46 +0200, Andras Horvai said: > >>>>>>> Dear List, > >>>>>>> > >>>>>>> does anyone have any clue why I got director error after a tape > >>>> change: > >>>>>> (I > >>>>>>> am using copy job to copy full backups (from file storage) to tape. > >>>>>>> > >>>>>>> Tape change message: > >>>>>>> > >>>>>>> Subject: Bacula: Intervention needed for > >>>> srv1-job.2019-05-06_09.00.01_21 > >>>>>>> 06-May 12:08 backup2-sd JobId 1038: Please mount append Volume > >>>>>> "WORMW-1182" > >>>>>>> or label a new one for: > >>>>>>> Job: srv1-job.2019-05-06_09.00.01_21 > >>>>>>> Storage: "LTO-6" (/dev/nst0) > >>>>>>> Pool: TapeArchive > >>>>>>> Media type: LTO-6 > >>>>>>> > >>>>>>> > >>>>>>> Then comes this: > >>>>>>> > >>>>>>> > >>>>>>> 06-May 09:00 backup2-dir JobId 1037: Copying using JobId=1016 > >>>>>>> Job=srv1-job.2019-05-04_02.00.00_59 > >>>>>>> 06-May 12:01 backup2-dir JobId 1037: Start Copying JobId 1037, > >>>>>>> Job=ArchiveJob.2019-05-06_09.00.01_20 > >>>>>>> 06-May 12:01 backup2-dir JobId 1037: Using Device "FileStorage" to > >>>> read. > >>>>>>> ...snip... > >>>>>>> > >>>>>>> > >>>>>>> 06-May 12:56 backup2-sd JobId 1037: End of Volume "FILEW-1034" at > >>>>>>> addr=895179674 on device "FileStorage" (/backup). > >>>>>>> 06-May 12:56 backup2-sd JobId 1037: Elapsed time=00:54:40, Transfer > >>>>>>> rate=18.42 M Bytes/second > >>>>>>> 06-May 12:56 backup2-dir JobId 1037: Error: Bacula backup2-dir 9.4.1 > >>>>>>> (20Dec18): > >>>>>>> Build OS: x86_64-pc-linux-gnu ubuntu 18.04 > >>>>>>> Prev Backup JobId: 1016 > >>>>>>> Prev Backup Job: srv1-job.2019-05-04_02.00.00_59 > >>>>>>> New Backup JobId: 1038 > >>>>>>> Current JobId: 1037 > >>>>>>> Current Job: ArchiveJob.2019-05-06_09.00.01_20 > >>>>>>> Backup Level: Full > >>>>>>> Client: Archiver > >>>>>>> FileSet: "None" 2019-02-02 23:14:56 > >>>>>>> Read Pool: "ServersWeeklyFullFile" (From Command > >>>> input) > >>>>>>> Read Storage: "File" (From Pool resource) > >>>>>>> Write Pool: "TapeArchive" (From Command input) > >>>>>>> Write Storage: "LTO-6" (From Command input) > >>>>>>> Catalog: "MyCatalog" (From Client resource) > >>>>>>> Start time: 06-May-2019 12:01:27 > >>>>>>> End time: 06-May-2019 12:56:13 > >>>>>>> Elapsed time: 54 mins 46 secs > >>>>>>> Priority: 13 > >>>>>>> SD Files Written: 104,642 > >>>>>>> SD Bytes Written: 60,444,227,085 (60.44 GB) > >>>>>>> Rate: 18394.5 KB/s > >>>>>>> Volume name(s): WORMW-1181|WORMW-1182 > >>>>>>> Volume Session Id: 728 > >>>>>>> Volume Session Time: 1551107460 > >>>>>>> Last Volume Bytes: 34,658,297,856 (34.65 GB) > >>>>>>> SD Errors: 0 > >>>>>>> SD termination status: OK > >>>>>>> Termination: *** Copying Error *** > >>>>>>> > >>>>>>> > >>>>>>> Thanks for clarification! > >>>>>>> > >>>>>>> Andras > >>>>>>> > > > > _______________________________________________ > > Bacula-users mailing list > > Bacula-users@lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/bacula-users > > > > _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users