On Monday 10 September 2007 04:27, Ryan Novosielski wrote:
> Kern Sibbald wrote:
> > Hello,
> >
> > I regret to have to announce that there is a rather serious bug in
> > Bacula.
> >
> > Bacula bug #935 reports that during a restore, a large number of files
> > are missing and thus not restored.  This is really quite surprising
> > because we have a fairly extensive regression test suite that explicitly
> > tests for this kind of problem many times.
> >
> > Despite our testing, there is indeed a bug in Bacula that has the
> > following characteristics:
> >
> > 1. It happens only when multiple simultaneous Jobs are run (regardless of
> > whether or not data spooling is enabled).
> >
> > 2. It has only been observed on disk based backup, but not on tape.
> >
> > 3. Under the right circumstances (timing), it could and probably does
> > happen on tape backups.
> >
> > 4. It seems to be timing dependent, and requires multiple clients to
> > reproduce.
> >
> > 5. Analysis indicates that it happens most often when the clients are
> > slow (e.g. doing Incremental backups).
> >
> > 6. It has been verified to exist in versions 2.0.x and 2.2.x.
> >
> > 7. It should also be in version 1.38, but could not be reproduced in
> > testing, perhaps due to timing considerations or the fact that the test
> > FD daemons were version 2.2.2.
> >
> > 8. The data is correctly stored on the Volume, but incorrect index
> > (JobMedia) records are stored in the database.  (the JobMedia record
> > generated during the Volume change contains the index of the new Volume
> > rather than the previous Volume).
> >
> > 9. You can prevent the problem from occurring by either turning off
> > multiple simultaneous Jobs or by ensuring that while running multiple
> > simultaneous Jobs that those Jobs do not span Volumes.  E.g. you could
> > manually mark Volumes as full when they are sufficiently large.
> >
> > 10. If you are not running multiple simultaneous Jobs, you will not be
> > affected by this bug.
> >
> > 11. If you are running multiple simultaneous Jobs to tapes, I believe
> > there is a reasonable probability that this problem could show up when
> > Jobs are split across tapes.
> >
> > 12. If you are running multiple simultaneous Jobs to disks, I believe
> > there is a high probability that this problem will show up when Jobs are
> > split across disks Volumes.
> >
> > I have uploaded patches to bug #935 (bugs.bacula.org) that will correct
> > version 2.2.0, 2.2.1, and 2.2.2.  The patch has been tested only on
> > version 2.2.2 and passes all regression tests as well as the specific
> > test that reproduced the problem.
> >
> > After a little more testing, I plan to release version 2.2.3 probably on
> > Monday the 10th or Tuesday.
> >
> > At this time, I do not have a patch for 2.0.x versions, and unless there
> > is some really compelling reason to create one, I would prefer not -- it
> > would not be a huge effort to back port the patch, but it would require
> > rather extensive testing.  Though it is hard to make a specific
> > recommendation, I believe that it probably will be the wisest and
> > simplest to either patch version 2.2.x if that is what you are currently
> > running, or upgrade to version 2.2.3 when it is released.
>
> My personal recommendation would be to release a patch to all versions
> back to at 1.38.x if the bug can be verified. I know that not too many
> people are running that version anymore, but if this bug is serious
> enough that the software will not work, I would personally be worried
> that someone will use one of these versions (the latest available of the
> minor release, eg. the latest 1.38.x) and not know that this is a
> problem. 

Yes, notifying users is a problem.  If they are not subscribed to either the 
bugs database or the announce list, they are out of luck :-(

I'm considering adding a feature that the user could enable that would 
automatically notify him of critical problems, but that won't help in this 
case.

> Theoretically what you've discovered is that all versions of 
> Bacula at least back to 2.0.x are a time bomb of sorts, and really
> should not be used at all. 

The above is a bit too simple. There is a bug and it is serious, the most 
serious one we have had in a production release, but I wouldn't go so far as 
to say that those versions should not be used at all.  Please re-read the 
announcement.

> I can't think of any such bugs in the past 
> that carry a very real risk of data loss that were on non-beta versions
> of the code, 

Yes, this is the first major bug (aside from some encryption problems).  
However, there is no data loss.  It is there, and it *can* be recovered, it 
is just not automatic.

> and I think not fixing the problem in older releases would 
> not be good for Bacula's image. 

Well, I would not like to see anything bad for Bacula's image, but at the 
moment, my main concern is to tie this bug down, properly document it, and 
fix it in the current release.  My announcement was made the same day that I 
reproduced it, so it is still early in the process.  After we get the current 
version on track, we can calmly think about how to handle prior versions and 
probably much more important is how do we ensure that downstream packagers 
are aware of the problem.


> I'm running 2.0.3 presently, and it's 
> only 6 months old. I'm sure you can imagine there are many places that
> do not allow upgrades of major products except for certain times of the
> year.
>
> Not trying to give you a hard time, but I'm not sure how it would look
> to abandon such recent versions of software.

I haven't abandoned anything -- please re-read what I wrote.  I used the 
word "prefer not".  

>
> PS: Does this affect spooled simultaneous jobs, or only simultaneous
> jobs that are simultaneously writing to storage?

Please re-read item 1 of my announcement (above).

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to