The reading side is the same system. It is a copy job setup to backup
daily backups to the offsite backup disk.
The attachment is the bacula jobid 35202.
jerry
On Tue, Sep 19, 2017 at 10:08 AM, Martin Simmons <mar...@lispworks.com>
wrote:
> The email below is from the writing side of the copy job and the message:
>
> 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from
> Storage daemon:kilchis:9103: ERR=Connection reset by peer
>
> shows that the connection to the reading side of the job was closed
> unexpectedly from the reading end.
>
> Do you have the corresponding email from the reading side? It will have a
> different JobId (but should mention JobId 35203) and should start with
> something like "Using Device ... to read."
>
> __Martin
>
>
> >>>>> On Mon, 18 Sep 2017 13:42:19 -0700, Jerry Lowry said:
> >
> > Martin,
> > Here is the complete email that was sent just before the "Copy Error"
> > message:
> >
> > 12-Sep 15:09 kilchis-dir JobId 35203: Using Device "MidSwap" to write.
> > 12-Sep 15:09 kilchis JobId 35203: Volume "homeMS-200" previously
> written, moving to end of data.
> > 12-Sep 15:27 kilchis JobId 35203: End of medium on Volume "homeMS-200"
> Bytes=1,932,735,274,146 Blocks=29,959,317 at 12-Sep-2017 15:27.
> > 12-Sep 15:28 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage: "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type: File
> > 12-Sep 15:36 kilchis JobId 35203: Wrote label to prelabeled Volume
> "homeMS-201" on File device "MidSwap" (/MidSwap)
> > 12-Sep 15:36 kilchis JobId 35203: New volume "homeMS-201" mounted on
> device "MidSwap" (/MidSwap) at 12-Sep-2017 15:36.
> > 12-Sep 19:54 kilchis JobId 35203: End of medium on Volume "homeMS-201"
> Bytes=1,932,735,281,790 Blocks=29,959,315 at 12-Sep-2017 19:54.
> > 12-Sep 19:54 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage: "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type: File
> > 12-Sep 20:57 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage: "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type: File
> > 12-Sep 23:03 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage: "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type: File
> > 13-Sep 03:15 kilchis JobId 35203: Job BackupUsers.2017-09-12_09.05.09_50
> is waiting. Cannot find any appendable volumes.
> > Please use the "label" command to create a new Volume for:
> > Storage: "MidSwap" (/MidSwap)
> > Pool: OffsiteMid
> > Media type: File
> > 13-Sep 08:23 kilchis JobId 35203: Wrote label to prelabeled Volume
> "homeMS-202" on File device "MidSwap" (/MidSwap)
> > 13-Sep 08:23 kilchis JobId 35203: New volume "homeMS-202" mounted on
> device "MidSwap" (/MidSwap) at 13-Sep-2017 08:23.
> > 13-Sep 08:43 kilchis JobId 35203: Error: bsock.c:849 Read error from
> Storage daemon:kilchis:9103: ERR=Connection reset by peer
> > 13-Sep 08:43 kilchis JobId 35203: Fatal error: append.c:271 Network
> error reading from FD. ERR=Connection reset by peer
> > 13-Sep 08:43 kilchis JobId 35203: Elapsed time=04:56:15, Transfer
> rate=125.6 M Bytes/second
> > 13-Sep 08:43 kilchis JobId 35203: Sending spooled attrs to the Director.
> Despooling 1,533,148,574 bytes ...
> >
> > I don't have the job log. Interestingly, I did not have any problems with
> > this or any other copy job before I upgraded. I went from 5.2.13 to
> 9.0.3
> > of Bacula and latest version of MySql to Mariadb. Not saying that this
> is
> > a problem, because I have 5 other copy jobs that work without error
> still.
> > This one just happens to be the biggest one.
> >
> > thanks,
> > jerry
> >
> > On Mon, Sep 18, 2017 at 7:55 AM, Martin Simmons <mar...@lispworks.com>
> > wrote:
> >
> > > A copy job will communicate using TCP between the Bacula daemons. A
> bsock
> > > error could indicate that bacula-sd closed the connection unexpectedly
> and
> > > I
> > > would expect media errors to be logged.
> > >
> > > Your syslog did include some I/O errors. Any they caused by something
> > > else?
> > >
> > > Do you have the complete job log (from the Bacula log, not the syslog)?
> > >
> > > __Martin
> > >
> > >
> > > >>>>> On Wed, 13 Sep 2017 09:35:07 -0700, Jerry Lowry said:
> > > >
> > > > Kern,
> > > > My Offsite Backup just failed again on the same drive, different
> disk. It
> > > > failed with the same bsock error. If the backup is working on the
> same
> > > > system using the copy function, how far out of the network stack
> does it
> > > > go. My thinking is it does not get out of the application layer. Is
> > > this
> > > > right? Why would I get a bsock error?
> > > >
> > > > I have taken a look at the smart data for the disk and they seem to
> be
> > > > running okay. I am getting some sector relocation errors, would that
> > > cause
> > > > the bsock error during a remap? This procedure has been running
> > > flawlessly
> > > > for many years ( except for human error ). I am wondering if I
> should
> > > > delete the present disk files and let bacula recreate new ones.
> > > >
> > > > thanks for your help!
> > > >
> > > > jerry
> > > >
> > > >
> > > > On Wed, Sep 6, 2017 at 11:26 PM, Kern Sibbald <k...@sibbald.com>
> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > If the job is marked as Incomplete in the catalog ("I" I think),
> then
> > > you
> > > > > can simply restart it and it should pickup where it left off. If
> not
> > > you
> > > > > must run it again from the beginning.
> > > > >
> > > > > If you are switching devices when one is full during a Job, it is
> > > unlikely
> > > > > you can restore that job when it terminates. I recommend carefully
> > > testing
> > > > > restores on your system.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Kern
> > > > >
> > > > > On 09/06/2017 05:38 PM, Jerry Lowry wrote:
> > > > >
> > > > > List,
> > > > > I am running, bacula 9.0.3, Mariadb 12.2.8 on Centos 6.9. I got
> notice
> > > > > last night that my Offsite backup failed due to a bsock error. My
> > > offsite
> > > > > drives are attached to an ATTO raid card which gives me hot swap
> > > > > capability. This configuration works great as it allows me to hot
> swap
> > > a
> > > > > drive when it fills up with a new drive to continue with. The
> problem
> > > is
> > > > > included below. The backup that I was doing is to the OffsiteMid
> drive
> > > > > which is mounted as /dev/sde. Is there a way to restart this backup
> > > job or
> > > > > am I left with an incomplete backup going forward.
> > > > >
> > > > > thanks for your help,
> > > > >
> > > > > jerry
> > > > >
> > > > >
> > > > > Sep 5 08:46:01 kilchis bat[4339]: bsock.c:147 Unable to connect to
> > > > > Director dae
> > > > > mon on kilchis:9101. ERR=Connection refused
> > > > > Sep 5 10:37:20 kilchis attocfgd: [CRIT] [ExpressSAS
> > > > > R608,50:01:08:60:00:57:3d:c
> > > > > 0] [FW] RAID Group state now Offline: OffsiteTop
> > > > > Sep 5 10:39:06 kilchis kernel: scsi 5:0:1:0: Direct-Access
> ATTO
> > > > > Offsite
> > > > > Top00 0001 PQ: 0 ANSI: 5
> > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: Attached scsi generic
> sg6
> > > type
> > > > > 0
> > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Write Protect is
> off
> > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Write cache:
> enabled,
> > > > > read cac
> > > > > he: enabled, doesn't support DPO or FUA
> > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 10:39:06 kilchis kernel: sdd: unknown partition table
> > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 10:39:06 kilchis kernel: sd 5:0:1:0: [sdd] Attached SCSI
> disk
> > > > > Sep 5 10:39:35 kilchis kernel: sd 5:0:1:0: [sdd] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 10:39:35 kilchis kernel: sdd:
> > > > > Sep 5 10:44:54 kilchis kernel: EXT4-fs (sdd): mounted filesystem
> with
> > > > > ordered d
> > > > > ata mode. Opts:
> > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has
> > > errors=1
> > > > > on cal
> > > > > l to client:10.20.10.21:9101
> > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has
> > > errors=1
> > > > > on cal
> > > > > l to client:10.20.10.21:9101
> > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has
> > > errors=1
> > > > > on cal
> > > > > l to client:10.20.10.21:9101
> > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has
> > > errors=1
> > > > > on cal
> > > > > l to client:10.20.10.21:9101
> > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has
> > > errors=1
> > > > > on cal
> > > > > l to client:10.20.10.21:9101
> > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has
> > > errors=1
> > > > > on cal
> > > > > l to client:10.20.10.21:9101
> > > > > Sep 5 11:02:38 kilchis bacula-dir[4373]: bsock.c:537 Socket has
> > > errors=1
> > > > > on cal
> > > > > l to client:10.20.10.21:9101
> > > > > Sep 5 13:45:48 kilchis attocfgd: [CRIT] [ExpressSAS
> > > > > R608,50:01:08:60:00:57:3d:c
> > > > > 0] [FW] RAID Group state now Offline: OffsiteMid
> > > > > Sep 5 13:45:53 kilchis attocfgd: [CRIT] [ExpressSAS
> > > > > R608,50:01:08:60:00:57:3d:c
> > > > > 0] [FW] RAID Group state now Offline: OffsiteTop
> > > > > Sep 5 13:47:52 kilchis kernel: scsi 5:0:1:0: Direct-Access
> ATTO
> > > > > Offsite
> > > > > Mid00 0001 PQ: 0 ANSI: 5
> > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: Attached scsi generic
> sg6
> > > type
> > > > > 0
> > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Write Protect is
> off
> > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Write cache:
> enabled,
> > > > > read cac
> > > > > he: enabled, doesn't support DPO or FUA
> > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 13:47:52 kilchis kernel: sde: unknown partition table
> > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 13:47:52 kilchis kernel: sd 5:0:1:0: [sde] Attached SCSI
> disk
> > > > > Sep 5 13:48:01 kilchis kernel: EXT4-fs error (device sdd):
> > > > > __ext4_get_inode_loc
> > > > > : unable to read inode block - inode=2, block=1057
> > > > > Sep 5 13:48:01 kilchis kernel: Buffer I/O error on device sdd,
> logical
> > > > > block 0
> > > > > Sep 5 13:48:01 kilchis kernel: lost page write due to I/O error
> on sdd
> > > > > Sep 5 13:48:01 kilchis kernel: EXT4-fs error (device sdd) in
> > > > > ext4_reserve_inode
> > > > > _write: IO failure
> > > > > Sep 5 13:48:01 kilchis kernel: EXT4-fs (sdd): previous I/O error
> to
> > > > > superblock
> > > > > detected
> > > > > Sep 5 13:48:01 kilchis kernel: Buffer I/O error on device sdd,
> logical
> > > > > block 0
> > > > > Sep 5 13:48:01 kilchis kernel: lost page write due to I/O error
> on sdd
> > > > > Sep 5 13:48:06 kilchis kernel: Aborting journal on device sdd-8.
> > > > > Sep 5 13:48:06 kilchis kernel: Buffer I/O error on device sdd,
> logical
> > > > > block 24
> > > > > 3826688
> > > > > Sep 5 13:48:06 kilchis kernel: lost page write due to I/O error
> on sdd
> > > > > Sep 5 13:48:06 kilchis kernel: JBD2: I/O error detected when
> updating
> > > > > journal s
> > > > > uperblock for sdd-8.
> > > > > Sep 5 13:48:08 kilchis kernel: EXT4-fs error (device sdd):
> > > > > ext4_put_super: Coul
> > > > > dn't clean up the journal
> > > > > Sep 5 13:48:08 kilchis kernel: EXT4-fs (sdd): Remounting
> filesystem
> > > > > read-only
> > > > > Sep 5 13:48:44 kilchis kernel: sd 5:0:1:0: [sde] 488366336
> 4096-byte
> > > > > logical bl
> > > > > ocks: (2.00 TB/1.81 TiB)
> > > > > Sep 5 13:48:44 kilchis kernel: sde:
> > > > > Sep 5 13:54:05 kilchis kernel: EXT4-fs (sde): mounted filesystem
> with
> > > > > ordered d
> > > > > ata mode. Opts:
> > > > >
> > > > >
> > > > >
> > > > > ------------------------------------------------------------
> > > ------------------
> > > > > Check out the vibrant tech community on one of the world's most
> > > > > engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bacula-users mailing listBacula-users@lists.
> sourceforge.nethttps://
> > > lists.sourceforge.net/lists/listinfo/bacula-users
> > > > >
> > > > >
> > > > >
> > >
> >
>
12-Sep 09:05 kilchis-dir JobId 35202: Copying using JobId=35175
Job=BackupUsers.2017-09-09_20.05.00_20
12-Sep 15:09 kilchis-dir JobId 35202: Start Copying JobId 35202,
Job=CopyHMDiskToDisk.2017-09-12_09.05.08_49
12-Sep 15:09 kilchis-dir JobId 35202: Using Device "Home" to read.
12-Sep 15:09 kilchis JobId 35202: Ready to read from volume "home-4" on File
device "Home" (/engineering/Home).
12-Sep 15:09 kilchis JobId 35202: Forward spacing Volume "home-4" to
addr=56727564120
12-Sep 15:41 kilchis JobId 35202: End of Volume "home-4" at addr=214748348802
on device "Home" (/engineering/Home).
12-Sep 15:41 kilchis JobId 35202: Ready to read from volume "home-6" on File
device "Home" (/engineering/Home).
12-Sep 15:41 kilchis JobId 35202: Forward spacing Volume "home-6" to addr=215
12-Sep 16:04 kilchis JobId 35202: End of Volume "home-6" at addr=214748314935
on device "Home" (/engineering/Home).
12-Sep 16:04 kilchis JobId 35202: Ready to read from volume "home-8" on File
device "Home" (/engineering/Home).
12-Sep 16:04 kilchis JobId 35202: Forward spacing Volume "home-8" to addr=215
12-Sep 19:23 kilchis JobId 35202: End of Volume "home-8" at addr=1503238526521
on device "Home" (/engineering/Home).
12-Sep 19:23 kilchis JobId 35202: Ready to read from volume "home-7" on File
device "Home" (/engineering/Home).
12-Sep 19:23 kilchis JobId 35202: Forward spacing Volume "home-7" to addr=215
13-Sep 08:43 kilchis JobId 35202: End of Volume "home-7" at addr=358988868021
on device "Home" (/engineering/Home).
13-Sep 08:43 kilchis JobId 35202: Elapsed time=17:33:58, Transfer rate=35.31 M
Bytes/second
13-Sep 09:05 kilchis-dir JobId 35202: Error: Bacula kilchis-dir 9.0.3 (08Aug17):
Build OS: x86_64-pc-linux-gnu redhat
Prev Backup JobId: 35175
Prev Backup Job: BackupUsers.2017-09-09_20.05.00_20
New Backup JobId: 35203
Current JobId: 35202
Current Job: CopyHMDiskToDisk.2017-09-12_09.05.08_49
Backup Level: Full
Client: kilchis-fd
FileSet: "Mid Set" 2011-04-11 13:13:32
Read Pool: "HomePool" (From Command input)
Read Storage: "home" (From Job resource)
Write Pool: "OffsiteMid" (From Command input)
Write Storage: "midswap" (From Command input)
Catalog: "MyCatalog" (From Client resource)
Start time: 12-Sep-2017 15:09:48
End time: 13-Sep-2017 09:05:10
Elapsed time: 17 hours 55 mins 22 secs
Priority: 10
SD Files Written: 5,227,291
SD Bytes Written: 2,233,012,065,605 (2.233 TB)
Rate: 34608.5 KB/s
Volume name(s): homeMS-200|homeMS-201|homeMS-202
Volume Session Id: 40
Volume Session Time: 1504817299
Last Volume Bytes: 194,437,619,256 (194.4 GB)
SD Errors: 0
SD termination status: OK
Termination: *** Copying Error ***
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users