Kern Sibbald schrieb:
> Hello,
>
> It appears that TLS is getting stuck indefinitely in a read because of some 
> networking error.  
>
> You might try applying the attached patch.  There is a good chance that it 
> will break the SD out of this condition.
>
> Apply the patch with:
>
> cd <bacula-source>
> patch -p2 <3.0.3-tls-stall.patch
> ./configure <your-options>
> make
> ...
> make install
>
> Feedback would be appreciated.
>
> Regards,
>
> Kern
>
> On Thursday 19 November 2009 10:07:00 Christian Gaul wrote:
>   
>> Amongst many other clients, i backup my workstation using bacula (in
>> this case 3.0.3, but i've been seeing this since i started using bacula
>> with version 2.2 something).
>>
>> I can see the job for my client in the director, it is in the status
>> "Waiting for client XXX to connect to storage YYY", and it has been in
>> that status since i turned it off (around 13 hours ago). I am unable to
>> cancel the job, because it is not running or scheduled and none of the
>> other jobs on the director were able to start, they are all "waiting for
>> execution" and older jobs have been canceled (thanks for fixing the
>> canceled email notification with 3.0.3 btw) which means that, on this
>> director, i have not had nightly backups run on any of my clients, on
>> any of my SDs because a single client got turned off inbetween the
>> director initializing the job and the client making the connection to
>> the SD.
>>
>> I've been seeing this behavior, as i said, for a really long time now,
>> and it has caused me enough grief to set up a second director / SDs and
>> even two FDs per client. A single client, lets say a broken one, one
>> being turned off or a malicious one, can bring a whole director to a
>> halt. Is there some magic timeout value that is set to a (useless)
>> default value that i am missing, or is it rather non concurrent
>> connection creation that is blocking all my other jobs?
>>
>> I can leave the director in this state for a couple hours to perform
>> magic incantations (stacktrace, backtrace etc) if you want any
>> information about this issue.
>>
>> Ill attach the btraceback right away, also the last log lines.. but
>> since i am not running this director for testing, it isnt running under
>> any debug levels.
>>
>> After reviewing the bconsole output to make it postable, it seems that
>> some jobs did run after 18:03 (the time i turned off my workstation),
>> the last job ran (to a different SD than the one that blocked) at 02:30,
>> after that, no new jobs, even to different SDs, could start.
>>
>> I really appreciate the work you guys are doing on bacula and i would
>> love it if someone would take a look at this.
>>     
>
>
>   
I've applied the patch to the SD where the problem occurred, since it's
just a SD patch and doesn't change anything much, i don't think i'll
have to exchange all SD versions.
I will keep an eye on it, but since this only happens randomly i can not
promise anything much (except if it explodes or gets worse).

Thanks for your time

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to