Arno Lehmann wrote:
> Hello,
>
> 05.07.2007 13:07,, Alfredo Marchini wrote::
>   
>> Hi,
>>
>> Arno Lehmann wrote:
>>     
>>> Hi,
>>>
>>> 05.07.2007 12:07,, Alfredo Marchini wrote::
>>>   
>>>       
>>>> Hi Arno,
>>>> I don't know if you know bacula source code,
>>>>     
>>>>         
>>> A bit, but I usually look for problems in the configuration as, in my 
>>> experience, the source code is quite stable. Of course, there are 
>>> bugs, but these should be reproduceable in other installations, too. 
>>> Unless I find a setup that looks unique to me, I'm assuming the source 
>>> is ok and the problem lies in the configuration or general system.
>>>
>>>   
>>>       
>>>> so I post you some 
>>>> parameters and information in my configuration that I think can cause 
>>>> this problem or I think is not well configured because I don't well 
>>>> understand the manual:
>>>>
>>>> Arno Lehmann wrote:
>>>>     
>>>>         
>>>>> Hi,
>>>>>
>>>>> 04.07.2007 17:40,, Alfredo Marchini wrote::
>>>>>   
>>>>>       
>>>>>           
>>>>>> Hi,
>>>>>> The system and db logs doesn't tell me anything about this problem, like 
>>>>>> all the director processes or thread are locked concurrently.
>>>>>> If I restart only bacula-dir without restarting bacula-sd and 16 
>>>>>> bacula-fd the system restart working fine.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> It might be possible that the DIR is busy working on the catalog (like 
>>>>> pruning data) and just needs more time. You can check this using 
>>>>> 'mysqladmin processlist', for example.
>>>>>   
>>>>>       
>>>>>           
>>>>     ok, when rehappen I'll make also this test, but if bacula makes jobs 
>>>> and files pruning when volumes are all used, and there are no more 
>>>> appendable volumes, I don't have this problem because I've got used only 
>>>> 10 volumes of 50Gb and have other 8 volumes avalaible and not already 
>>>> created.
>>>>     
>>>>         
>>> Are you saying that there are always volumes available and thus no 
>>> pruning happens?
>>>
>>>   
>>>       
>> When the error occured I had 10 volumes used and 8 volumes avalaible, 
>>     
>
> Good.
>
>   
>> but after 3 weeks 18 volumes are all used and then bacula makes 
>> recycling of the oldest that have inside the jobs older than 14 days 
>> (max file, job and volume retention period).
>>     
>
> Do you have reasons to assume that these retention periods are 
> responsible for the problem you describe? You describe them again and 
> again, but I still don't see where this might lead to the DIR being stuck.
>
>   

No, I think is all ok about pool, storage, and retention periods.

>> Here's my sd config:
>>     
>
> I think we agree that the most probable location of your problem is i 
> the DIR.
>
>   
>> Here's my dir config:
>>     
> ...
>
>   
>> Ah, ok, so If I have 16 concurrent jobs to the sd I can set maximum 
>> concurrent jobs
>> to sd to 16 (also on director side). Is correct?
>>     
>
> Yes. For the DIR, you should set a higher limit, because console 
> connections count as jobs. My advice is to actually limit the number 
> of concurrent jobs in the storage resource of the DIR, and simply 
> allow enough jobs in the SD config.
>
> ...
>   
>>>> Another thing:
>>>> I've setted for all fd the messages that points to the director messages.
>>>>         
> ...
>   
>> No, messages are correctly sent to director.
>>     
>
> So I don't think this is relevant.
>
>   
>>>> Last thing and I've got no more:
>>>>
>>>> If I go to working directory of bacula-dir, when is not responding, I 
>>>> find the files of mail that have to be send via e-mail to the operators 
>>>> old 2-3 days, as the bacula-dir is blocked and cannot send the e-mail 
>>>> (when is working fine the mail are correctly sent to all the operators).
>>>>     
>>>>         
>>> Obviously, when the DIR is blocked, it will not finish jobs and thus 
>>> not send mail.
>>>
>>> Does your above statement imply that your DIR is stuck for some days, 
>>> when it happens? That would probably rule out catalog performance 
>>> issues as even an underpowered database server should finish the 
>>> queries after a few days...
>>>
>>>   
>>>       
>> Yes, I find the dir locked yesterday, but the last log, mail, and backup 
>> is at 30-06-2007 in the night.
>>     
>
> But there were jobs running and completed after Jun 30, but before Jul 
> 5, but they didn't send mail?
>
>   

Yesterday I found 2 mail file not sent. The files have the date of 30-06.
And after DIR restart (now) the mail are already not sent.
If I see logs the last date is 30-06. So the DIR was blocked for 4 days 
including 30-06 (Yesterday I restart the blocked DIR).

> Also, what did your status monitors (of which you have at least 16...) 
> report?
>   

Nothing about the block, and monitors are configured but I haven't a 
monitor always connected.
Sometimes users connects to the DIR and use monitor for watching the 
status of fd, dir and sd.

>   
>> After that the director is locked, and I have no more info about it.
>>     
>
> Well, it seems as though you do have lots of information, but didn't 
> actually collect the relevant things to understand what's happening...
>
>   

I don't find anything that can cause this problem.

>> For the db I use mysql 5.0, standard rpm installation, and in the log I 
>> have not info about problems.
>> The biggest table is the File, 832432 records,FileName are 424158, Path 
>> are 31414 and Log 38362.
>>
>> If you think I need to enlarge mysql resources, Is not a problem.
>>     
>
> You'd get error messages (at least in the log files) if the tables 
> were full or the disk space exhausted.
>
>   
And I've got no error messages.
>> Can the problem be caused because I specified in messages this?
>>     
>  >
>   
>> catalog = all, !skipped, !saved, !terminate
>>
>> this write many data to the db, if the problem is the db, I can remove 
>> this rule.
>>     
>
> But you told us there were no database issues... did you actually 
> check (with mysqladmin processlist) what the database is doing?
>   
yes...
now the database is doing nothing.
there's only show processlist.

> I really think you should stop looking in all directions at once. Try 
> to narrow down the possible reasons for your problem, instead of 
> looking at everything at once.
>
> If you can reproduce the error, create a debug log file (trace) and 
> observe the catalog - that's my advice, at least.
>
> Alternatively, when the DIR hangs again, use strace and / or gdb to 
> see what's happening.
>
> You gave good reasons to believe that your hardware, OS and database 
> are working correctly, so stop looking there.
>
> Arno
>
>   

about 2 weeks and the DIR will block again.
I'll wait for it.
strace has many options, which I need to specify to view all the 
information useful to find the problem?
The real problem is that I don't know which cause can create the lock.
I'm desperate because I don't have any idea, and all the configuration 
files seems to be well configured,
or not configured to cause this problem.

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to