Hi, 08.08.2007 09:30,, Ralf Gross wrote:: > Hi, > > while looking for my verify problems, I noticed that my full backup had errors > at the end of the first tape but terminated with status OK. Therefor I noticed > this error very late.
But you actually kept and read the system log! Congratulations ;-) > /var/log/kern.log > Aug 5 06:00:29 VU0EM005 kernel: st0: Current: sense key: Aborted Command > Aug 5 06:00:29 VU0EM005 kernel: Additional sense: Data phase error > > > 05-Aug 02:13 VU0EM005: Start Backup JobId 416, > Job=VU0EM003.2007-08-05_00.05.01 > 05-Aug 02:13 VU0EM005: Spooling data ... > VU0EM003: /home is a different filesystem. Will not descend from / into > /home > VU0EM003: /public is a different filesystem. Will not descend from / > into /public > VU0EM003: /lib/init/rw is a different filesystem. Will not descend from > / into /lib/init/rw > VU0EM003: /server is a different filesystem. Will not descend from / > into /server > 05-Aug 04:44 VU0EM005: User specified spool size reached. > 05-Aug 04:44 VU0EM005: Writing spooled data to Volume. Despooling > 450,971,582,471 bytes ... > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:569 Write > error at 405:11595 on device "LTO3" (/dev/ULTRIUM-TD3). > ERR=Eingabe-/Ausgabefehler. > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: block.c:317 Volume > data error at 405:4294967295! > Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 blk=20b379e5 > 05-Aug 06:00 VU0EM005: VU0EM003.2007-08-05_00.05.01 Error: Re-read last block > at EOT failed. ERR=block.c:317 Volume data error at 405:4294967295! > Block checksum mismatch in block=5235094 len=64512: calc=2a90e8e7 blk=20b379e5 > 05-Aug 06:00 VU0EM005: End of medium on Volume "06D128L3" > Bytes=404,723,386,368 Blocks=6,273,613 at 05-Aug-2007 06:00. Ok, this is unverified by me, but I *believe* that in these cases Bacul does the right thing, i.e. it writes the block in question to the next tape again. At least I hope this is the point of doing the re-read of the block where an error occured... And it would explain why this is not an error from the job's point of view. > 05-Aug 06:02 VU0EM005: Using Volume "06D138L3" from 'Scratch' pool. > 05-Aug 06:02 VU0EM005: 3307 Issuing autochanger "unload slot 9, drive 0" > command. > 05-Aug 06:03 VU0EM005: 3304 Issuing autochanger "load slot 19, drive 0" > command. > 05-Aug 06:03 VU0EM005: 3305 Autochanger "load slot 19, drive 0", status is OK. > 05-Aug 06:03 VU0EM005: 3301 Issuing autochanger "loaded? drive 0" command. > 05-Aug 06:03 VU0EM005: 3302 Autochanger "loaded? drive 0", result is Slot 19. > 05-Aug 06:03 VU0EM005: Recycled volume "06D138L3" on device "LTO3" > (/dev/ULTRIUM-TD3), all previous data lost. > 05-Aug 06:03 VU0EM005: New volume "06D138L3" mounted on device "LTO3" > (/dev/ULTRIUM-TD3) at 05-Aug-2007 06:03. > 05-Aug 06:31 VU0EM005: Despooling elapsed time = 01:44:20, Transfer rate = > 72.04 M bytes/second > 05-Aug 06:42 VU0EM005: Spooling data again ... > 05-Aug 08:12 VU0EM005: Job write elapsed time = 05:55:07, Transfer rate = > 33.16 M bytes/second > 05-Aug 08:12 VU0EM005: Committing spooled data to Volume "06D138L3". > Despooling 256,418,397,693 bytes ... > 05-Aug 09:13 VU0EM005: Despooling elapsed time = 01:01:30, Transfer rate = > 69.49 M bytes/second > 05-Aug 09:20 VU0EM005: Alert: smartctl version 5.36 > [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen > 05-Aug 09:20 VU0EM005: Alert: Home page is > http://smartmontools.sourceforge.net/ > 05-Aug 09:20 VU0EM005: Alert: > 05-Aug 09:20 VU0EM005: Alert: TapeAlert: OK > 05-Aug 09:20 VU0EM005: Alert: > 05-Aug 09:20 VU0EM005: Alert: Error Counter logging not supported > 05-Aug 09:20 VU0EM005: Alert: > 05-Aug 09:20 VU0EM005: Alert: Non-medium error count: 0 > 05-Aug 09:20 VU0EM005: Sending spooled attrs to the Director. Despooling > 600,205,128 bytes ... > 05-Aug 09:37 VU0EM005: Bacula 2.0.3 (06Mar07): 05-Aug-2007 09:37:43 > 05-Aug 09:37 VU0EM005: Bacula 2.0.3 (06Mar07): 05-Aug-2007 09:37:43 > JobId: 416 > Job: VU0EM003.2007-08-05_00.05.01 > Backup Level: Full > Client: "VU0EM003" 2.0.2 (28Jan07) > x86_64-unknown-linux-gnu,debian,4.0 > FileSet: "VU0EM003" 2007-06-12 00:05:01 > Pool: "Full" (From Run pool override) > Storage: "NEC-T40A" (From Job resource) > Scheduled time: 05-Aug-2007 00:05:00 > Start time: 05-Aug-2007 02:13:49 > End time: 05-Aug-2007 09:37:43 > Elapsed time: 7 hours 23 mins 54 secs > Priority: 10 > FD Files Written: 1,631,258 > SD Files Written: 1,631,258 > FD Bytes Written: 706,308,678,026 (706.3 GB) > SD Bytes Written: 706,666,406,084 (706.6 GB) > Rate: 26519.1 KB/s > Software Compression: None > VSS: no > Encryption: no > Volume name(s): 06D128L3|06D138L3 > Volume Session Id: 106 > Volume Session Time: 1184053226 > Last Volume Bytes: 369,532,025,856 (369.5 GB) > Non-fatal FD errors: 0 > SD Errors: 0 > FD termination status: OK > SD termination status: OK > Termination: Backup OK > > > > I've rerun the btape test (test/fill). No problems. At the moment I'm running > the full backup to the same tapes again. The end of the first tape was > detected > without Block checksum mismatch. You could have tried a restore of the job with the (possible) error... Regarding the tape write/read error, I believe these happen from time to time, especially when related to end of tape detection (400GB on LTO3 seems like the tape was actually full, but for some reason the drive didn't report the approaching EOT to the OS). I would worry if this happened regularly, but if it's a one-time glitch, I think there's no problem. Arno > Maybe it was just bad luck, but why did the job terminate with OK? I got no > mail, because bacula is configured to only send mail on error. > > Ralf > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Bacula-users mailing list > Bacula-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/bacula-users -- Arno Lehmann IT-Service Lehmann www.its-lehmann.de ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users