I throw this up in the air for people to criticise or tell me it's not
possible or suggest improvements.
It really won't affect the vast majority of Bacula users, but for those of
us backing up Terabytes or who need to keep backups for several years, it
should provoke some discussion.
=====
I have a need to rescan entire _large_ pools of tapes in order to rebuild
the database(*) and laboriously scanning in tapes for the last two weeks
has given me time to think about the current shortcomings of bscan....
Number one is that Bscan only allows enough argument space to scan about
10 tapes at a time. This is OK if you know where jobs start and end but
when you don't, it's problematic.
The reason for that is as I read the documentation, bscan discards
"incomplete" save sets (ie, where the saveset spans multiple tapes
and not all of them are read in one invokation of Bscan.
Number two is that Bscan simply scans an entire tape from one end to the
other without any type of intelligence applied to optimise speeds.
That's fine if you're only scanning one tape (although it takes nearly 3
hours to read a LTO2 tape) but it doesn't scal;e for resync jobs.
(*) Misunderstanding of the expiry algorithms means that I have 2 years'
backup tapes in the safe but only metadata for the last 4 months' files.
This became obvious to me during the discussions here and became an issue
when one of the researchers discovered files which were somehow truncated
to 10 bytes approximately 6 months ago and requested older versions
be restored.
=====
Which brings me to my wishlist, probably for something probably named
along the lines of "bsync".
Tape handling
1: The program will be able to take a "Pool" argument and check every
single tape in that pool, preferably checking an autochanger's index
and loading/unloading tapes unsupervised as needed, requesting
changeouts if the necessary tapes are not in the changer.
2: The program must be able to run while Bacula is running, using tape
drives when Bacula is not using them. This is necessary because a large
pool or group of pools may take weeks to fully resync and tieing up a
drive for that period of time will (of course) interfere with backups.
This will require some sort of cooperation with Bacula, perhaps
checking the Bacula execute queue at the end of each tape scanned
and standing aside until the queue is empty, or the drive is free again.
Scanning speedups
3: The program will skip sections of tape where the databse "knows" there
are files (ie, there is a file index)
3a: Alternatively, the program will "quickly" verify that a file block on
tape is correct by reading in the first "N" records and checking that
they tally with the database records of their positions.
Moving on to "unknown" areas of tape
4: The program will "hunt" bootstrap records, if there is enough data in
these to be able to rebuild database entries
4a: Having ingested the bootstrap records, the program will either fully
verify the files' existance on tape, or "quickly" do it, using the
behaviour described in 3a
5: For remaining "unknown" areas of tape after sections 3 and 4 are
completed (most likely to be incomplete backups where a .bsr was never
recorded), the program will then scan and insert files in those
remaining "unknown" areas into the database.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users