Hi all,

we have have hit upon a conundrum in our (so far very satisfying) use of 
Bacula.

Our central file server crashed while Bacula 1.38.4 was writing Job 150 to 
tape:

| 150   | DefaultBackup       | 2006-04-27 04:05:02 | B    | D     | 0        
| 0           | R         |

After restarting (and updating the kernel [different story]), we manually 
reran the backup jobs to the same tape (Thursday-0001), which completed 
successfully:

| 152   | DefaultBackup       | 2006-04-27 14:19:14 | B    | D     | 221770   
| 5623183006  | T         |
| 153   | BackupCatalog       | 2006-04-27 14:53:12 | B    | F     | 1        
| 939894373   | T         |

Note that there is no entry for job 151, which would have been the Catalog 
backup for Job 150.

However, attempting to restore any job past the failed job 150 is no longer 
possible. Here's a log for a restore attempt of job 152

03-May 14:09 erebor-dir: Start Restore Job RestoreFiles.2006-05-03_14.09.14
03-May 14:09 erebor-sd: Ready to read from volume "Thursday-0001" on device 
"LTO-2" (/dev/nst0).
03-May 14:09 erebor-sd: Forward spacing to file:block 276:0.
03-May 14:10 erebor-sd: RestoreFiles.2006-05-03_14.09.14 Error: block.c:263 
Volume data error at 276:8671! Wanted ID: "BB02", got "*". Buffer discarded.
03-May 14:10 erebor-dir: RestoreFiles.2006-05-03_14.09.14 Error: Bacula 1.38.8 
(14Apr06): 03-May-2006 14:10:57

Similarly, a bls -j on the volume Thursday-0001 ends with:

Begin Job Session Record:
JobId             : 150
VerNum            : 11
PoolName          : ThursdayTapePool
PoolType          : Backup
JobName           : DefaultBackup
ClientName        : erebor-fd
Job (unique name) : DefaultBackup.2006-04-27_04.05.00
FileSet           : FullSet
JobType           : B
JobLevel          : D
Date written      : 27-Apr-2006 09:58
02-May 19:06 bls: Got EOF at file 272  on device "LTO-2" (/dev/nst0), Volume
"Thursday-0001"
02-May 19:07 bls: Got EOF at file 273  on device "LTO-2" (/dev/nst0), Volume
"Thursday-0001"
02-May 19:07 bls: Got EOF at file 274  on device "LTO-2" (/dev/nst0), Volume
"Thursday-0001"
02-May 19:07 bls: Got EOF at file 275  on device "LTO-2" (/dev/nst0), Volume
"Thursday-0001"
02-May 19:07 bls: Got EOF at file 276  on device "LTO-2" (/dev/nst0), Volume
"Thursday-0001"
02-May 19:07 bls: bls Error: block.c:263 Volume data error at 276:8671! Wante
d ID: "BB02", got "*". Buffer discarded.
 Bacula status: file=276 block=8671
 Device status: WR_PROT ONLINE IM_REP_EN file=276 block=8672

A rather desperate attempt to fix the media record with bscan -m (now updated 
to 1.38.8) also fails with:

bscan: bscan.c:487 SOS_LABEL: Found Job record for JobId: 0
03-May 16:02 bscan: End of file 271  on device "LTO-2" (/dev/nst0), Volume 
"Thursday-0001"
bscan: bscan.c:487 SOS_LABEL: Found Job record for JobId: 0
bscan: bscan.c:669 4,784,128 file records. At file:blk=271:3,760 
bytes=234,529,489,158
bscan: bscan.c:669 4,816,896 file records. At file:blk=271:6,236 
bytes=234,689,085,090
03-May 16:02 bscan: End of file 272  on device "LTO-2" (/dev/nst0), Volume 
"Thursday-0001"
03-May 16:03 bscan: End of file 273  on device "LTO-2" (/dev/nst0), Volume 
"Thursday-0001"
bscan: bscan.c:669 4,849,664 file records. At file:blk=273:36,561 
bytes=236,644,324,571
bscan: bscan.c:669 4,882,432 file records. At file:blk=273:43,492 
bytes=237,091,261,422
03-May 16:03 bscan: End of file 274  on device "LTO-2" (/dev/nst0), Volume 
"Thursday-0001"
03-May 16:03 bscan: End of file 275  on device "LTO-2" (/dev/nst0), Volume 
"Thursday-0001"
03-May 16:03 bscan: End of file 276  on device "LTO-2" (/dev/nst0), Volume 
"Thursday-0001"
bscan: bscan.c:669 4,915,200 file records. At file:blk=276:82,665 
bytes=239,616,955,619
bscan: bscan.c:669 4,947,968 file records. At file:blk=276:83,527 
bytes=239,672,522,633
bscan: bscan.c:669 4,980,736 file records. At file:blk=276:84,596 
bytes=239,741,454,494
03-May 16:04 bscan: bscan Error: block.c:263 Volume data error at 276:8671! 
Wanted ID: "BB02", got "*". Buffer discarded.
 Bacula status: file=276 block=8671
 Device status: WR_PROT ONLINE IM_REP_EN file=276 block=8672
Records would have been added or updated in the catalog:
      1 Media
      1 Pool
     69 Job
4982796 File

There are at least two aspects to all this. First, how can we continue to use 
this tape (backups to it still run) and actually _restore_ data from it? 
Second, if the data base has gotten so wedged that restores are no longer 
possible, Bacula should refrain from indicating successful backups to the 
volume, and instead mark it as `Error'.

Any ideas how to resolve this?

Andreas Koch

Attachment: pgp1iBC5ZAf0p.pgp
Description: PGP signature

Reply via email to