Hello,
On 1/29/2006 4:23 PM, Alan Brown wrote:
I throw this up in the air for people to criticise or tell me it's not
possible or suggest improvements.
For fun? Great! :-)
It really won't affect the vast majority of Bacula users, but for those
of us backing up Terabytes or who need to keep backups for several
years, it should provoke some discussion.
Personally, I don't fall into that category, but still I see the problem.
=====
I have a need to rescan entire _large_ pools of tapes in order to
rebuild the database(*) and laboriously scanning in tapes for the last
two weeks has given me time to think about the current shortcomings of
bscan....
Number one is that Bscan only allows enough argument space to scan about
10 tapes at a time. This is OK if you know where jobs start and end but
when you don't, it's problematic.
AFAIK, you can feed some sort of options file to bscan, just for that
reason. I would have to look through the ReleaseNotes to confirm that.
The reason for that is as I read the documentation, bscan discards
"incomplete" save sets (ie, where the saveset spans multiple tapes
and not all of them are read in one invokation of Bscan.
That might be a technical necessity, but it would be a big improvement
if bscan could kind of cache the incomplete data and consider it in
later runs to complete jobs.
Number two is that Bscan simply scans an entire tape from one end to the
other without any type of intelligence applied to optimise speeds.
That's fine if you're only scanning one tape (although it takes nearly 3
hours to read a LTO2 tape) but it doesn't scal;e for resync jobs.
(*) Misunderstanding of the expiry algorithms means that I have 2 years'
backup tapes in the safe but only metadata for the last 4 months' files.
This became obvious to me during the discussions here and became an
issue when one of the researchers discovered files which were somehow
truncated to 10 bytes approximately 6 months ago and requested older
versions be restored.
Sh*t. Probably too late now, but taking the job status mails and
extracting some sort of volume use logs which contain at least the
information which job, on which volume, was written to which tape and
print that might be interesting... but of course everybody relies on the
catalog and the correct setup :-|
=====
Which brings me to my wishlist, probably for something probably named
along the lines of "bsync".
Tape handling
1: The program will be able to take a "Pool" argument and check every
single tape in that pool, preferably checking an autochanger's index
and loading/unloading tapes unsupervised as needed, requesting
changeouts if the necessary tapes are not in the changer.
Basic volume management, in my opinion.
The pool argument should not be required, though - there might be cases
where you need to rescan your whole tape collection without knowing
which tapes belong to which pool.
2: The program must be able to run while Bacula is running, using tape
drives when Bacula is not using them. This is necessary because a large
pool or group of pools may take weeks to fully resync and tieing up a
drive for that period of time will (of course) interfere with backups.
Definitely. Using Baculas current autochanger support, you can set up
drives not to be selected automatically. The other ones would be the
ones you use for restores and resync, right?
This will require some sort of cooperation with Bacula, perhaps
checking the Bacula execute queue at the end of each tape scanned
and standing aside until the queue is empty, or the drive is free again.
I would suggest to implement that sort of job in the normal Bacula
operations - done by the SD, controlled by the DIR. The currnt volme
management tools are mainly disaster recovery tools, but the scenario
you describe is not exactly what I'd consider desaster recovery. Rather,
it's more or less a day-to-day volume / catalog management task. Well,
in your case it's month-to-month.
Scanning speedups
3: The program will skip sections of tape where the databse "knows" there
are files (ie, there is a file index)
Right. It's not about producing another bscan.
3a: Alternatively, the program will "quickly" verify that a file block on
tape is correct by reading in the first "N" records and checking that
they tally with the database records of their positions.
Sounds like a command option to me.
Moving on to "unknown" areas of tape
4: The program will "hunt" bootstrap records, if there is enough data in
these to be able to rebuild database entries
That's the part I don't understand - where do you hunt bootstrap
records? Everything known in the catalog is already handled in steps 3
and 3a, right? So, would you feed it all the .bsr files you have stored
on disk or somewhere else, or are the bootstrap records also written to
tape and might be read there?
4a: Having ingested the bootstrap records, the program will either fully
verify the files' existance on tape, or "quickly" do it, using the
behaviour described in 3a
5: For remaining "unknown" areas of tape after sections 3 and 4 are
completed (most likely to be incomplete backups where a .bsr was never
recorded), the program will then scan and insert files in those
remaining "unknown" areas into the database.
Yes.
Well, even if I might never need your suggested enhancement, I can
understand the need for it.
The resync caability you describe might well be considered a requirement
in a large installation, I guess - and your proposal sounds quite
reasonable.
I guess that people with a better knowledge of enterprise backup systems
- David, Adam, for example - might have some suggestions.
Arno
--
IT-Service Lehmann [EMAIL PROTECTED]
Arno Lehmann http://www.its-lehmann.de
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users