On Jul 10, 2011, at 3:18 PM, Steve Costaras wrote:
>
> -----Original Message-----
> From: Dan Langille [mailto:d...@langille.org]
> Sent: Sunday, July 10, 2011 12:58 PM
> To: stev...@chaven.com
> Cc: bacula-users@lists.sourceforge.net
> Subject: Re: [Bacula-users] Catastrophic error. Cannot write overflow block
> to device "LTO4"
>
> >>
> >> 2) since everything is spooled first, there should be NO error that should
> >> cancel a job. A tape drive could fail, a tape could burst into flame, all
> >> that would be needed was bacula to know that >>there was an issue and give
> >> the admin a simple statement do you want to fix the issue or cancel?, the
> >> admin to fix the problem, and then bacula told to restart from the last
> >> block that was >>stored successfully OR if need be from the beginning of
> >> the spooled data file.
>
> >This I do know. Although, at first glance it seems easy to do this, it is
> >not. If it was trivial to do, I assure you, it would already be in place.
>
> >> Canceling jobs that run for days for TB's of data is just screwed up.
>
> >I suggest running smaller jobs. I don't mean to sound trite, but that really
> >is the solution. Given that the alternative is non-trivial, the sensible
> >choice is, I'm afraid, cancel the job.
>
> I'm already kicking off 20+ jobs for a single system already. This does not
> work when we're talking over the 100TB/nearly 200TB mark. And when these
> errors happen it does not matter how many jobs you have as /all/ outstanding
> jobs fail when you have concurancy (in this case all jobs that were qued and
> were not even writing to the same tape were canceled).
This sounds like a configuration issue. Queued jobs should not be cancelled
when a previous job cancels.
> This does not happen with any other enterprise backup software not that they
> should be 100% mimicked.
> With the data sizes we have today I don't see why there are not better error
> handling checks/routines.
This is open source software. Stuff gets written because someone wants it.
Clearly, nobody who wants it has written. That is why it does not exist.
--
Dan Langille - http://langille.org
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users