Kern Sibbald schrieb: > Hello, > > It appears that TLS is getting stuck indefinitely in a read because of some > networking error. > > You might try applying the attached patch. There is a good chance that it > will break the SD out of this condition. > > Apply the patch with: > > cd <bacula-source> > patch -p2 <3.0.3-tls-stall.patch > ./configure <your-options> > make > ... > make install > > Feedback would be appreciated. > > Regards, > > Kern > > On Thursday 19 November 2009 10:07:00 Christian Gaul wrote: > >> Amongst many other clients, i backup my workstation using bacula (in >> this case 3.0.3, but i've been seeing this since i started using bacula >> with version 2.2 something). >> >> I can see the job for my client in the director, it is in the status >> "Waiting for client XXX to connect to storage YYY", and it has been in >> that status since i turned it off (around 13 hours ago). I am unable to >> cancel the job, because it is not running or scheduled and none of the >> other jobs on the director were able to start, they are all "waiting for >> execution" and older jobs have been canceled (thanks for fixing the >> canceled email notification with 3.0.3 btw) which means that, on this >> director, i have not had nightly backups run on any of my clients, on >> any of my SDs because a single client got turned off inbetween the >> director initializing the job and the client making the connection to >> the SD. >> >> I've been seeing this behavior, as i said, for a really long time now, >> and it has caused me enough grief to set up a second director / SDs and >> even two FDs per client. A single client, lets say a broken one, one >> being turned off or a malicious one, can bring a whole director to a >> halt. Is there some magic timeout value that is set to a (useless) >> default value that i am missing, or is it rather non concurrent >> connection creation that is blocking all my other jobs? >> >> I can leave the director in this state for a couple hours to perform >> magic incantations (stacktrace, backtrace etc) if you want any >> information about this issue. >> >> Ill attach the btraceback right away, also the last log lines.. but >> since i am not running this director for testing, it isnt running under >> any debug levels. >> >> After reviewing the bconsole output to make it postable, it seems that >> some jobs did run after 18:03 (the time i turned off my workstation), >> the last job ran (to a different SD than the one that blocked) at 02:30, >> after that, no new jobs, even to different SDs, could start. >> >> I really appreciate the work you guys are doing on bacula and i would >> love it if someone would take a look at this. >> > > > I've applied the patch to the SD where the problem occurred, since it's just a SD patch and doesn't change anything much, i don't think i'll have to exchange all SD versions. I will keep an eye on it, but since this only happens randomly i can not promise anything much (except if it explodes or gets worse).
Thanks for your time ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
