Hi all, we have have hit upon a conundrum in our (so far very satisfying) use of Bacula.
Our central file server crashed while Bacula 1.38.4 was writing Job 150 to tape: | 150 | DefaultBackup | 2006-04-27 04:05:02 | B | D | 0 | 0 | R | After restarting (and updating the kernel [different story]), we manually reran the backup jobs to the same tape (Thursday-0001), which completed successfully: | 152 | DefaultBackup | 2006-04-27 14:19:14 | B | D | 221770 | 5623183006 | T | | 153 | BackupCatalog | 2006-04-27 14:53:12 | B | F | 1 | 939894373 | T | Note that there is no entry for job 151, which would have been the Catalog backup for Job 150. However, attempting to restore any job past the failed job 150 is no longer possible. Here's a log for a restore attempt of job 152 03-May 14:09 erebor-dir: Start Restore Job RestoreFiles.2006-05-03_14.09.14 03-May 14:09 erebor-sd: Ready to read from volume "Thursday-0001" on device "LTO-2" (/dev/nst0). 03-May 14:09 erebor-sd: Forward spacing to file:block 276:0. 03-May 14:10 erebor-sd: RestoreFiles.2006-05-03_14.09.14 Error: block.c:263 Volume data error at 276:8671! Wanted ID: "BB02", got "*". Buffer discarded. 03-May 14:10 erebor-dir: RestoreFiles.2006-05-03_14.09.14 Error: Bacula 1.38.8 (14Apr06): 03-May-2006 14:10:57 Similarly, a bls -j on the volume Thursday-0001 ends with: Begin Job Session Record: JobId : 150 VerNum : 11 PoolName : ThursdayTapePool PoolType : Backup JobName : DefaultBackup ClientName : erebor-fd Job (unique name) : DefaultBackup.2006-04-27_04.05.00 FileSet : FullSet JobType : B JobLevel : D Date written : 27-Apr-2006 09:58 02-May 19:06 bls: Got EOF at file 272 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 02-May 19:07 bls: Got EOF at file 273 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 02-May 19:07 bls: Got EOF at file 274 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 02-May 19:07 bls: Got EOF at file 275 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 02-May 19:07 bls: Got EOF at file 276 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 02-May 19:07 bls: bls Error: block.c:263 Volume data error at 276:8671! Wante d ID: "BB02", got "*". Buffer discarded. Bacula status: file=276 block=8671 Device status: WR_PROT ONLINE IM_REP_EN file=276 block=8672 A rather desperate attempt to fix the media record with bscan -m (now updated to 1.38.8) also fails with: bscan: bscan.c:487 SOS_LABEL: Found Job record for JobId: 0 03-May 16:02 bscan: End of file 271 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" bscan: bscan.c:487 SOS_LABEL: Found Job record for JobId: 0 bscan: bscan.c:669 4,784,128 file records. At file:blk=271:3,760 bytes=234,529,489,158 bscan: bscan.c:669 4,816,896 file records. At file:blk=271:6,236 bytes=234,689,085,090 03-May 16:02 bscan: End of file 272 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 03-May 16:03 bscan: End of file 273 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" bscan: bscan.c:669 4,849,664 file records. At file:blk=273:36,561 bytes=236,644,324,571 bscan: bscan.c:669 4,882,432 file records. At file:blk=273:43,492 bytes=237,091,261,422 03-May 16:03 bscan: End of file 274 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 03-May 16:03 bscan: End of file 275 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" 03-May 16:03 bscan: End of file 276 on device "LTO-2" (/dev/nst0), Volume "Thursday-0001" bscan: bscan.c:669 4,915,200 file records. At file:blk=276:82,665 bytes=239,616,955,619 bscan: bscan.c:669 4,947,968 file records. At file:blk=276:83,527 bytes=239,672,522,633 bscan: bscan.c:669 4,980,736 file records. At file:blk=276:84,596 bytes=239,741,454,494 03-May 16:04 bscan: bscan Error: block.c:263 Volume data error at 276:8671! Wanted ID: "BB02", got "*". Buffer discarded. Bacula status: file=276 block=8671 Device status: WR_PROT ONLINE IM_REP_EN file=276 block=8672 Records would have been added or updated in the catalog: 1 Media 1 Pool 69 Job 4982796 File There are at least two aspects to all this. First, how can we continue to use this tape (backups to it still run) and actually _restore_ data from it? Second, if the data base has gotten so wedged that restores are no longer possible, Bacula should refrain from indicating successful backups to the volume, and instead mark it as `Error'. Any ideas how to resolve this? Andreas Koch
pgp1iBC5ZAf0p.pgp
Description: PGP signature