Re: [Bacula-users] "resume" failed job

mark . bergman Mon, 08 Jan 2007 10:54:09 -0800


In the message dated: Mon, 08 Jan 2007 18:19:37 +0100,
The pithy ruminations from Kern Sibbald on 
<Re: [Bacula-users] "resume" failed job> were:
=> On Monday 08 January 2007 14:22, Aaron Knister wrote:
=> > Thanks for the sarcasm. NOT. I come here for support, not to be  
=> > ridiculed. And i'll have you know that the restore from that backup  
=> > set did work.
=> 
=> Yes, I was sarcastic, but life is (at least I am) like that.  
=> 
=> I certainly didn't mean to ridicule you, but rather to warn you that IMO, you


Well, the original poster isn't the only one who thought your response was very 
strong.

I've had similar failures--due to client problems--in the midst of multi-TB
backups, and was extremely interested to hear about what might have been a
solution to resume the backup without starting from the beginning.

=> were doing something that is *very* unlikely to work. Perhaps I was wrong -- 
=> I guess you fell into the 1% uncertainty that I had.  I have my doubts, 
=> anyway, good luck.

I appreciate the warning, and probably won't try the same technique of mucking 
with the backup status value in the database. I think that this is a very 
interesting thread, in terms of making bacula more robust. I can see many 
applications for the idea of resuming an interrupted backup (the failure that 
the the original poster described, desktop backups where users may shut off 
machines, support for mobile users over unreliable network links, etc.).

=> 
=> My advice to other users, remains the same: If a Job fails, the File records 
=> will most likely not have been inserted, and in such a case, marking the job 
=> as successfully terminated will most likely result in restore failure (or 
=> screw up of some sort) later down the road because those File records are 
=> critical for most restore operations.

Hmmm.... I don't know anything about the internal structure of bacula, or much
about databases, but it seems to me that this is a serious weakness. Would it be
possible for baclula to function more like a journaling filesystems in terms of
keeping consistency?

Would it be possible to use the existing algorithms in bacula to insert the File
records into a temporary db table--and also write File records to the temporary
table in anticipation that the write to storage will succeed, and then [move/
copy/insert] those records into the real table only when the write to the
storage media has been acknowledged?

Would this scheme make it possible to:
        read the real table corresponding to the media
        read the media
        read the temporary table
        reconcile the differences (ie., any filesets written to media, with
                entries in the temporary table, but lacking records in the
                permanent table would have the File records updated from the
                temporary table)

Mark "exposing my db ignorance daily" Bergman



=> 
=> > 
=> > 
=> > On Jan 8, 2007, at 4:29 AM, Kern Sibbald wrote:
=> > 
=> > > On Sunday 07 January 2007 23:03, Aaron Knister wrote:
=> > >> I solved this problem myself. I'm not sure how elegant the solution
=> > >> is, however.
=> > >>
=> > >> Using myphpadmin I changed the "JobStatus" field in the respective
=> > >> jobid's mysql entry from "f" to "T". I then re-ran the job and it
=> > >> picked up more or less where it left off.
=> > >
=> > > Yes, well congratulations.  I give you 99% probability of having  
=> > > created a set
=> > > of backups that cannot be restored.
=> > >
=> > > I *strongly* recommend that other users don't try manually  
=> > > modifying the DB
=> > > unless you understand *all* the little details of how job records  
=> > > are put
=> > > into the DB.
=> > >
=> > >>
=> > >> -Aaron
=> > >>
=> > >> On Jan 5, 2007, at 10:29 PM, Aaron Knister wrote:
=> > >>
=> > >>> Hi,
=> > >>> I recently had a backup job fail mid way. It was backing up 5
=> > >>> terabytes of data, and had written 3tb off to tape. The job stopped
=> > >>> because there were no more writable volumes in the particular volume
=> > >>> pool to which the job was assigned. I cleared up a volume however  
=> > >>> the
=> > >>> job did not resume and after a while errored out. I would like to
=> > >>> know if i can salvage the 3 terabytes that was already written to
=> > >>> tape and just continue from that point.
=> > >>>
=> > >>> Many thanks.
=> > >>>
=> > >>> -Aaron
=> > >>>

----
Mark Bergman                      [EMAIL PROTECTED]
System Administrator
Section of Biomedical Image Analysis             215-662-7310
Department of Radiology,           University of Pennsylvania

http://pgpkeys.pca.dfn.de:11371/pks/lookup?search=mark.bergman%40.uphs.upenn.edu



The information contained in this e-mail message is intended only for the 
personal and confidential use of the recipient(s) named above. If the reader of 
this message is not the intended recipient or an agent responsible for 
delivering it to the intended recipient, you are hereby notified that you have 
received this document in error and that any review, dissemination, 
distribution, or copying of this message is strictly prohibited. If you have 
received this communication in error, please notify us immediately by e-mail, 
and delete the original message.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] "resume" failed job

Reply via email to