On Fri, 7 Sep 2007, Dan Langille wrote: > On 8 Sep 2007 at 2:42, Doytchin Spiridonov wrote: >> At least we found a working solution (no concurrent jobs, because with >> concurent jobs bacula was useless) hoping they will fix it sometime >> when they receive enough proof that there IS a bug. You can reopen it >> (as I'm not going to do it after I've got several times a response >> "can't replicate, so there are no bugs") at bugs.bacula.org > What do you suggest we do if we are unable to replicate the bug? > What course[s] of action would you suggest?
OK, I will suggest a course of action. I am sure that you would agree that enough people have reported this issue now to confirm that there is a major problem with concurrent job processing that is unrelated to any hardware issues. I am sure you would also agree that people are running bacula for a reason, and they expect to be able to restore their data, and consequently they cannot enter into a testing regime with production systems to debug this. I can reproduce this problem at will, but I cannot use my own systems nor any customer systems for debugging it further, nor give access to anyone else to do the same. Now that it is known that using Max Concurrent Jobs greater than 1 can lead to volume corruption, no system that I manage can use concurrent jobs until the cause is known and fixed. And this will apply to everyone using bacula: test your restores regularly. Presumably "you" (developers, not just you personally) have testing systems for which the actual backed up data is not important, and that can therefore be used to investigate this issue, and that you have a way to verify the structural integrity of the saved data volumes, and that you cannot expect folks running bacula in production to have the same. Since the developers also presumably have an interest in the functionality of the code base, and are familiar with the structure of that code, I would suggest that for such a major issue an inability to reproduce the problem by doing a number of successful restores is not sufficient cause to stop investigating it: it has to be worked on it until the cause is known. Let me state again that this is a major show-stopper problem. Obviously Doytchin has spent considerable time on it already, and his efforts allow both him and me, and probably many others, to run backups with a reasonable expectation of being able to restore. I have some spare hardware that I can probably rig up for testing, but I have a business to run and my time is therefore limited. I am willing, however, to assist in whatever way I can, given these constraints. Steve ---------------------------------------------------------------------------- Steve Thompson E-mail: smt AT vgersoft DOT com Voyager Software LLC Web: http://www DOT vgersoft DOT com 39 Smugglers Path VSW Support: support AT vgersoft DOT com Ithaca, NY 14850 "186,300 miles per second: it's not just a good idea, it's the law" ---------------------------------------------------------------------------- ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users