I throw this up in the air for people to criticise or tell me it's not possible or suggest improvements.

It really won't affect the vast majority of Bacula users, but for those of us backing up Terabytes or who need to keep backups for several years, it should provoke some discussion.


=====


I have a need to rescan entire _large_ pools of tapes in order to rebuild the database(*) and laboriously scanning in tapes for the last two weeks has given me time to think about the current shortcomings of bscan....


 Number one is that Bscan only allows enough argument space to scan about
  10 tapes at a time. This is OK if you know where jobs start and end but
  when you don't, it's problematic.

  The reason for that is as I read the documentation, bscan discards
  "incomplete" save sets (ie, where the saveset spans multiple tapes
  and not all of them are read in one invokation of Bscan.


 Number two is that Bscan simply scans an entire tape from one end to the
 other without any type of intelligence applied to optimise speeds.

 That's fine if you're only scanning one tape (although it takes nearly 3
 hours to read a LTO2 tape) but it doesn't scal;e for resync jobs.


(*) Misunderstanding of the expiry algorithms means that I have 2 years' backup tapes in the safe but only metadata for the last 4 months' files. This became obvious to me during the discussions here and became an issue when one of the researchers discovered files which were somehow truncated to 10 bytes approximately 6 months ago and requested older versions be restored.

=====

Which brings me to my wishlist, probably for something probably named along the lines of "bsync".

Tape handling


1: The program will be able to take a "Pool" argument and check every
   single tape in that pool, preferably checking an autochanger's index
   and loading/unloading tapes unsupervised as needed, requesting
   changeouts if the necessary tapes are not in the changer.


2: The program must be able to run while Bacula is running, using tape
   drives when Bacula is not using them. This is necessary because a large
   pool or group of pools may take weeks to fully resync and tieing up a
   drive for that period of time will (of course) interfere with backups.

   This will require some sort of cooperation with Bacula, perhaps
   checking the Bacula execute queue at the end of each tape scanned
   and standing aside until the queue is empty, or the drive is free again.


Scanning speedups


3:  The program will skip sections of tape where the databse "knows" there
    are files (ie, there is a file index)

3a: Alternatively, the program will "quickly" verify that a file block on
    tape is correct by reading in the first "N" records and checking that
    they tally with the database records of their positions.


Moving on to "unknown" areas of tape


4:  The program will "hunt" bootstrap records, if there is enough data in
    these to be able to rebuild database entries

4a: Having ingested the bootstrap records, the program will either fully
    verify the files' existance on tape, or "quickly" do it, using the
    behaviour described in 3a



5:  For remaining "unknown" areas of tape after sections 3 and 4 are
    completed (most likely to be incomplete backups where a .bsr was never
    recorded), the program will then scan and insert files in those
    remaining "unknown" areas into the database.





-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to