Hello,

On 1/29/2006 4:23 PM, Alan Brown wrote:


I throw this up in the air for people to criticise or tell me it's not possible or suggest improvements.

For fun? Great!  :-)

It really won't affect the vast majority of Bacula users, but for those of us backing up Terabytes or who need to keep backups for several years, it should provoke some discussion.

Personally, I don't fall into that category, but still I see the problem.


=====


I have a need to rescan entire _large_ pools of tapes in order to rebuild the database(*) and laboriously scanning in tapes for the last two weeks has given me time to think about the current shortcomings of bscan....


 Number one is that Bscan only allows enough argument space to scan about
  10 tapes at a time. This is OK if you know where jobs start and end but
  when you don't, it's problematic.

AFAIK, you can feed some sort of options file to bscan, just for that reason. I would have to look through the ReleaseNotes to confirm that.

  The reason for that is as I read the documentation, bscan discards
  "incomplete" save sets (ie, where the saveset spans multiple tapes
  and not all of them are read in one invokation of Bscan.

That might be a technical necessity, but it would be a big improvement if bscan could kind of cache the incomplete data and consider it in later runs to complete jobs.


 Number two is that Bscan simply scans an entire tape from one end to the
 other without any type of intelligence applied to optimise speeds.

 That's fine if you're only scanning one tape (although it takes nearly 3
 hours to read a LTO2 tape) but it doesn't scal;e for resync jobs.


(*) Misunderstanding of the expiry algorithms means that I have 2 years' backup tapes in the safe but only metadata for the last 4 months' files. This became obvious to me during the discussions here and became an issue when one of the researchers discovered files which were somehow truncated to 10 bytes approximately 6 months ago and requested older versions be restored.

Sh*t. Probably too late now, but taking the job status mails and extracting some sort of volume use logs which contain at least the information which job, on which volume, was written to which tape and print that might be interesting... but of course everybody relies on the catalog and the correct setup :-|

=====

Which brings me to my wishlist, probably for something probably named along the lines of "bsync".

Tape handling


1: The program will be able to take a "Pool" argument and check every
   single tape in that pool, preferably checking an autochanger's index
   and loading/unloading tapes unsupervised as needed, requesting
   changeouts if the necessary tapes are not in the changer.

Basic volume management, in my opinion.

The pool argument should not be required, though - there might be cases where you need to rescan your whole tape collection without knowing which tapes belong to which pool.


2: The program must be able to run while Bacula is running, using tape
   drives when Bacula is not using them. This is necessary because a large
   pool or group of pools may take weeks to fully resync and tieing up a
   drive for that period of time will (of course) interfere with backups.

Definitely. Using Baculas current autochanger support, you can set up drives not to be selected automatically. The other ones would be the ones you use for restores and resync, right?

   This will require some sort of cooperation with Bacula, perhaps
   checking the Bacula execute queue at the end of each tape scanned
   and standing aside until the queue is empty, or the drive is free again.

I would suggest to implement that sort of job in the normal Bacula operations - done by the SD, controlled by the DIR. The currnt volme management tools are mainly disaster recovery tools, but the scenario you describe is not exactly what I'd consider desaster recovery. Rather, it's more or less a day-to-day volume / catalog management task. Well, in your case it's month-to-month.


Scanning speedups


3:  The program will skip sections of tape where the databse "knows" there
    are files (ie, there is a file index)

Right. It's not about producing another bscan.

3a: Alternatively, the program will "quickly" verify that a file block on
    tape is correct by reading in the first "N" records and checking that
    they tally with the database records of their positions.

Sounds like a command option to me.


Moving on to "unknown" areas of tape


4:  The program will "hunt" bootstrap records, if there is enough data in
    these to be able to rebuild database entries

That's the part I don't understand - where do you hunt bootstrap records? Everything known in the catalog is already handled in steps 3 and 3a, right? So, would you feed it all the .bsr files you have stored on disk or somewhere else, or are the bootstrap records also written to tape and might be read there?

4a: Having ingested the bootstrap records, the program will either fully
    verify the files' existance on tape, or "quickly" do it, using the
    behaviour described in 3a



5:  For remaining "unknown" areas of tape after sections 3 and 4 are
    completed (most likely to be incomplete backups where a .bsr was never
    recorded), the program will then scan and insert files in those
    remaining "unknown" areas into the database.

Yes.

Well, even if I might never need your suggested enhancement, I can understand the need for it.

The resync caability you describe might well be considered a requirement in a large installation, I guess - and your proposal sounds quite reasonable.

I guess that people with a better knowledge of enterprise backup systems - David, Adam, for example - might have some suggestions.

Arno


--
IT-Service Lehmann                    [EMAIL PROTECTED]
Arno Lehmann                  http://www.its-lehmann.de


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to